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IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 

IN RE APPLICATION OF Group Art Unit: 

JOBLING, ETAL Examiner: 
INTERNATIONAL APPLN. NO. PCT/GB97/03032 
INTERNATIONAL FILING DATE 4 NOVEMBER 1997 
S.N. 

FILED: CONCURRENTLY HEREWITH 

FOR: IMPROVEMENTS IN OR RELATING TO 

STARCH 

Commissioner of Patents and Trademarks 
Washington, D.C. 20231 

PRELIMINARY AMENDMENT 

Sir: 

In the above-identified application, Applicant respectfully requests the following 
preliminary amendment be entered and the claims considered in light thereof. 



IN THE CLAIMS 

Amend claims 4-5, 8, 11, 14-15, 18, 20, 22-24, and 28-31 to read: 



4. (amended once) A nucleic acid sequence according to [any one of claims 1, 
2 or 3] claim 1 comprising a 5' and/or a 3' untranslated region. 

5. (amended once) A nucleic acid sequence according to claim 1 [any one of 
the preceding claims], encoding a polypeptide having the amino acid sequence NSKH at 
about residue 697. 

8. (amended once) A sequence according to claim 6 [or 7], comprising a 
5'and/or 3'untranslated region. 

1 1 . (amended once) A replicable nucleic acid construct comprising a nucleic 
acid sequence according to claim 1 [any one of the preceding claims]. 

14. (amended once) A polypeptide according to claim 12 [or 13], having the 
amino acid sequence NSKH at about position 697. 

15. (amended once) A method of modifying starch in vitro, the method 
comprising treating starch to be modified under suitable conditions with an effective amount 
of a polypeptide according to claim 12 [any one of claims 12, 13 or 14]. 

18. (amended once) A method according to claim 16 [or 17], comprising the 
introduction of one or more further nucleic acid sequences, operably linked in the sense or 
anti-sense orientation to a suitable promoter active in the host cell, and causing transcription 
of the one or more further nucleic acid sequences, said transcripts and/or translation 
products thereof being sufficient to interfere with the expression of homologous gene(s) 
present in the host cell. 

20. (amended once) A method according to claim 18 [or 19], wherein the further 
nucleic acid sequence comprises at least part of an SBE I gene. 

22. (amended once) A method according to claim 16 [any one of claims 16 - 
21], wherein the host cell is selected from one of the following: cassava, banana, potato, pea, 
tomato, maize, wheat, barley, oat, sweet potato or rice. 

23. (amended once) A method according to claim 16 [any one of claims 16-22], 
wherein the altered host cell gives rise to starch having different properties compared to 
starch from an unaltered cell. 

24. (amended once) A method according to claim 16 [any one of claims 16-23], 
further comprising the step of growing the altered host cell into a plant or plantlet. 



28. (amended once) Starch obtainable from an altered plant according to claim 
26 [or 27], having altered properties compared to starch extracted from an equivalent but 
unaltered plant. 

29. (amended once) Starch obtained from an altered plant according to claim 
26 [or 27], having altered properties compared to starch extracted from an equivalent but 
unaltered plant. 

30. (amended once) Starch according to claim 28 [or 29] obtained from an 
altered plant selected from the group consisting of:- cassava, banana, potato, pea, tomato, 
maize, wheat, barley, oat, sweet potato and rice plants. 

31 . (amended once) Starch according to claim 28 [any one of claims 28, 29 or 
30], having increased amylose content compared to starch extracted from an equivalent but 
unaltered plant. 

Cancel claims 32-35. 
Add new claim 36 to read: 

36. A replicable nucleic acid construct comprising a nucleic acid sequence 
according to claim 6 [any one of the preceding claims]. - 

STATUS OF THE CLAIMS 

Claims 1-35 were internationally filed in PCT/GB97/03032. 
Claims 4-5, 8, 11, 14-15, 18, 20, 22-24, and 28-31 were amended. 
Claims 32-35 have been canceled. 
Claim 36 has been added. 

Claims 1-31 and 36 are presented for consideration. 
REMARKS 

Claims 4-5, 8, 11, 14-15, 18, 20, 22-24, and 28-31 were amended to remove 
multiple dependencies. 

Claims 32-35 have been canceled as not in conformance with standard US patent 

practice. 

Claim 36 has been added based on original claim 1 1 . No new matter has been 

added. 



In view of the foregoing, Applicant respectfully requests early action on this 
application. 



Respectfully submitted, 




Karen G. Kaiser 
Attorney for Applicants 
Reg. No. 33,506 



Dated: ,fT tCQjj 9°) 



National Starch and Chemical Company 
P.O. Box 6500 

Bridgewater, NJ 08807-0500 
(908) 575-6152 



-4- 



101 imaPCHHQ 0 a MY 1983 

WO 98/20145 PCT/GB97/03032 - 

i 09/297708 



Title : Improvements in or Relating to! Starch Content of Plants ) 



Field of the Invention 

This invention relates to novel nucleic acid sequences, vectors and host cells comprising 
the nucleic acid sequenee(s), to polypeptides encoded thereby, and to a method of altering 
a host cell by introducing the nucleic acid sequence(s) of the invention. 

Background to the Invention 

Starch consists of two main polysaccharides, amylose and amylopectin. Amylosc is a 
linear polymer containing a-lA linked glucose units, while amylopectin is a highly 
branched polymer consisting of a a-lA linked giucan backbone with linked glucan 

branches. In most plant storage reserves amylopectin consitutes about 75% of the starch 
content. Amylopectin is synthesized by the concerted action of soluble starch synthase and 
starch branching enzyme [a-lA glucan: tx-lA glucan 6-glycosyltransferase, EC 2.4.1.18]. 
Starch branching enzyme (SBE) hydrolyses a- 1 A linkages and rejoins the cleaved glucan. 
via an «-Lfi linkage, to an acceptor chain to produce a branched structure. The physical 
properties of starch are strongly affected by the relative abundance of amylose and 
amylopectin, and SBE is therefore a crucial enzyme in determining both the quantity and 
quality of starches produced in plant systems. 

Starches are commercially available from several plant sources including maize, potato and 
cassava. Each of these starches has unique physical characteristics and properties and a 
variety of possible industrial uses. In maize there are a number of naturally occurring 
mutants which have altered starch composition such as high amylopectin tvpes (''waxy" 
starches) or high amylose starches but in potato and cassava no such mutants exist on a 
commercial basis as vet. 
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Genetic modification offers the possibility of obtaining new starches which may have novel 
and potentially useful characteristics. Most of the work to date has involved potato plants 
because they are amenable to genetic manipulation i.e. they can be transformed using 
Agrohaclerium and regenerated easily from tissue culture. In addition many of the genes 
involved in starch biosynthesis have been cloned from potato and thus are available as 
targets for genetic manipulation, for example, by antisense inhibition of expression or 
sense suppression. 

Cassava (Manihot esculenta L. Crantz) is an important crop in the tropics, where its 
starch-filled roots are used both as a food source and increasingly as a source of starch. 
Cassava is a high yielding perennial crop that can grow on poor soils and is also tolerant 
ot drought. Cassava starch being a root-derived starch has properties similar but not 
identical to potato starch and is composed of 20-25% amylose and 75-80% amylopectin 
(Rickard et aL, 1991. Trop. Sci. 31, 189-207). Some of the genes involved in starch 
biosynthesis have been cloned from cassava, including starch branching enzyme I (SBE 
I) (Salehuzzaman et al^ 1994 Plant Science 98, 53-62), and granule bound starch synthase 
I (GBSS I) (Salehuzzaman et aL. 1993 Plant Molecular Biology 23, 947-962) and some 
work has been done on their expression patterns although only in in vitro grown plants 
(Salehuzzaman et aL, \99A Plant Science 98. 53-62). 

In most plants studied to date e.g. maize (Boyer & Preiss. 1978 Biochem, Biophys. Res. 
Comm. SO, 169-175), rice (Smyth, 1988 Plant Sci. 57, 1-8) and pea (Smith, Planta 775, 
270-279). two forms of SBE have been identified, each encoded by a separate gene. A 
recent review by Burton et aL, ( 1995 The Plant Journal 7, 3-15) has demonstrated that the 
two forms of SBE constitute distinct classes of the enzyme such that, in general, enzymes 
of the same class from different plants may exhibit greater similarity than enzymes of 
different classes from the same plant. In their review. Burton et aL termed the two 
respective enzyme families class "A" and class "B" , and the reader is referred thereto (and 
to the references cited therein) for a detailed discussion of the distinctions between the two 
classes. One general distinction of note would appear to be the presence, in class A SBE 
molecules, of a flexible N-terminal domain, which is not found in class B molecules. The 
distinctions noted by Burton et aL are relied on herein to define class A and class B SBE 
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Many organisations have interests in obtaining modified Cassava starches by means of 
genetic modification. This is impossible to achieve however, unless the plant is amenable 
to transformation and regeneration, and the starch biosynthesis genes which are to be 
targeted for modification must be cloned. The production of transgenic cassava plants has 
only recently been demonstrated (Taylor et aL, 1996 Nature Biotechnology 14, 726-730; 
Schopke et aL, 1996 Nature Biotechnology 14 ? 731-735; and Li et aL, 1996 Nature 
Biotechnology 14. 736-740). The present invention concerns the identification, cloning 
and sequencing of a starch biosynthetic gene from Cassava, suitable as a target for genetic 
manipulation. 

Summary of the Invention 

In a first aspect the invention provides a nucleic acid sequence encoding a polypeptide 
having starch branching enzyme (SBE) activity, the polypeptide comprising an effective 
portion of the amino acid sequences shown in Figure 4 or Figure 13. The nucleic acid 
is conveniently in substantial isolation, especially in isolation from other naturally 
associated nucleic acid sequences. 

An "effective portion" of the amino acid sequences may be defined as a portion which 
retains sufficient SBE activity when expressed in £. coli KV832 to complement the 
branching enzyme mutation therein. The amino acid sequences shown in Figures 4 and 
13 include the N terminal transit peptide, which comprises about the first 50 amino acid 
residues. As those skiUed in the art will be well aware, such a transit peptide is not 
essential for SBE activity. Thus the mature polypeptide, lacking a transit peptide, may 
be considered as one example of an effective portion of the amino acid sequence shown 
in Figure 4 or Figure 13. 

Other effective portions may be obtained by effecting minor deletions in the amino acid 
sequence, whilst substantially preserving SBE activity. Comparison with known class A 
SBE sequences, with the benefit of the disclosure herein, will enable those skilled in the 
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art to identify regions of the polypeptide which are less well conserved and so amenable 
to minor deletion, or amino acid substitution (particularly, conservative amino acid 
substitution) whilst substantially preserving SBE activity. Such less well-conserved 
regions are generally found. in the N terminal amino acid residues (up to the triple proline 
"elbow" at residues 138-140 in Figure 4 and up to the proline elbow at residues 143-145 
in Figure 13) and in the last 50 residues or so of the C terminal, and in particular in the 
acidic tail of the C terminal. 

Conveniently the nucleic acid sequence is obtainable from cassava, preferably obtained 
therefrom, and typically encodes a polypeptide obtainable from cassava. In a particular 
embodiment, the encoded polypeptide may have the amino acid sequence NSKH at about 
position 697 (in relation to Figure 4), which sequence appears peculiar to an isoform of 
the SBE class A enzyme of cassava, other class A SBE enzymes having the conserved 
sequence DA D/E Y (Burton et aL, 1995 cited above). 

In a particular aspect of the invention there is provided a nucleic acid comprising a portion 
of nucleotides 21 to 2531 of the nucleic acid sequence shown in Figure 4, or a functionally 
equivalent nucleic acid sequence. Such functionally equivalent nucleic acid sequences 
include, but are not limited to, those sequences which encode substantially the same amino 
acid sequence but which differ in nucleotide sequence from that shown in Figure 4 by 
virtue of the degeneracy of the genetic code. For example, a nucleic acid sequence may 
be altered (e.g. "codon optimised' 1 ) for expression in a host other than cassava, such that 
the nucleotide sequence differs substantially whilst the amino acid sequence of the encoded 
polypeptide is unchanged. Other functionally equivalent nucleic acid sequences are those 
which will hybridise under stringent hybridisation conditions (e.g. as described by 
Sambrook et aL, Molecular Cloning. A Laboratory Manual, CSH, i.e. washing with 
O.lxSSC, 0.5% SDS at 68 °C) with the sequence shown in Figure 4. Figure 10 shows a 
functionally equivalent sequence designated "125 -f 94", which includes a region 
corresponding to the 3' coding portion of the sequence in Figure 4. Figure 13 shows a 
functionally equivalent sequence which comprises a second complete SBE coding sequence 
(the SBE-derived sequence is from nucleotides 35 to 2760, of which the coding sequence 
is nucleotides 131-2677, the rest of the sequence in the figure is vector-derived). 
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Functionally equivalent DNA sequences will preferably comprise at least 2GO-300bp, more 
preferably 300-600bp, and will exhibit at least 88% identity (more preferably at least 9Q7< , 
and most preferably at least 95^ identity) with the corresponding region of the DNA 
sequence shown in figures 4 or 10. Those skilled in the art will readily be able to conduct 
a sequence alignment between the putative functionally equivalent sequence and those 
detailed in Figures 4 or 10 - the identity of the two sequences is to be compared in those 
regions which are aligned by standard computer software, which aligns corresponding 
regions of the sequences. 

In particular embodiments the nucleic acid sequence may alternatively comprise a 5' 
and/or a 3* untranslated region ("UTR"), examples of which are shown in Figures 2 and 
4. Figure 9 includes a 3 1 UTR, as nucleotides 688-1044 and Figure 10 includes 3' UTR 
as nucleotides 1507-1900 (which nucleotides correspond to the first base after the "stop" 
codon to the base immediately preceding the poly (A) tail). Any one of the sequences 
defined above, or a functional equivalent thereof (as defined by hybridisation properties, 
^ as set out in the preceding paragraph), could be useful in sense or anti-sense inhibition of 
corresponding genes, as will be apparent to those skilled in the art. It will also be 
apparent to those skilled in the art that such regions may be modified so as to optimise 
expression in a particular type of host cell and that the 5* and/or 3' UTRs could be used 
in isolation, or in combination with a coding portion of the sequence of the invention. 
Similarly, a coding portion could be used without a 5* or a 3' UTR if desired. 

In a further aspect, the invention provides a replicable nucleic acid construct comprising 
any one of the nucleic acid sequences defined above. The construct will typically 
comprise a selectable marker and may allow for expression of the nucleic acid sequence 
of the invention. Conveniently the vector will comprise a promoter (especially a promoter 
sequence operable in a plant and/or a promoter operable in a bacterial cell) and one or 
more regulatory signals known to those skilled in the art. 

In another aspect the invention provides a polypeptide having SBE activity, the polypeptide 
comprising an effective portion of the amino acid sequence shown in Figure 4 or Figure 
13. The polypeptide is conveniently one obtainable from cassava, although it may be 
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derived using recombinant DNA techniques. The polypeptide is preferably in substantial 
isolation from other polypeptides of plant origin, and more preferably in substantial 
isolation from any other polypeptides. The polypeptide may have amino acid residues 
NSKH at about position 697 (in the sequence shown in Figure 4), instead of the sequence" 
DA D/E Y found in other SBE class A polypeptides. The polypeptide may be used in a 
method of modifying starch in vitro, the method comprising treating starch under suitable 
conditions (of temperature, pH etc.) with an effective amount of the polypeptide. 

Those skilled in the art will appreciate that the disclosure of the present specification can 
be utilised in a number of ways. In particular, the characteristics of a host cell may be 
altered by recombinant DNA techniques. Thus, in a further aspect, there is provided a 
method by which a host cell may be altered by introduction of a nucleic acid sequence 
comprising at least 200bp and exhibiting at least 88% sequence identity (more preferably 
at least 90%. and most preferably at least 95% identity) with the corresponding region of 
the DNA sequence shown in Figures 4, 9, 10 or 13, operably linked in the sense or 
(preferably) in the anti-sense orientation to a suitable promoter active in the host cell, and 
causing transcription of the introduced nucleic acid sequence, said transcript and/or the 
translation product thereof being sufficient to interfere with the expression of a 
homologous gene naturally present in said host cell, which homologous gene encodes a 
polypeptide having SBE activity. The altered host cell is typically a plant cell, such as a 
cell of a cassava, banana, potato, sweet potato, tomato, pea. wheat, barley, oat, maize, 
or rice plant. 

Desirably the method further comprises the introduction of one or more nucleic acid 
sequences which are effective in interfering with the expression of other homologous gene 
or genes naturally present in the host cell. Such other genes whose expression is inhibited 
may be involved in starch biosynthesis (e.g. an SBE I gene), or may be unrelated to SBE 
II. 

Those skilled in the art will be aware that both anti-sense inhibition, and "sense 
suppression" of expression of genes, especially plant genes, has been demonstrated (e.g. 
Matzke & Matzke 1995 Plant Physiol. 107. 679-685). 
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It is believed that antisense methods are mainly operable by the production of antisense 
mRiNA which hybridises to the sense mRNA, preventing its translation into functional 
polypeptide, possibly by causing the hybrid RNA to be degraded (e.g. Sheehy et aL, 1988 
PNAS 85, 8805-8809; Van der Krol et aL. Mol. Gen. Genet. 220, 204-212). Sense 
suppression also requires homology between the introduced sequence and the target gene, 
but the exact mechanism is unclear. It is apparent however that, in relation to both 
antisense and sense suppression, neither a full length nucleotide sequence, nor a "native" 
sequence is essential. Preferably the nucleic acid sequence used in the method will 
comprise at least 200-300bp, more preferably at least 300-600bp, of the full length 
sequence, but by simple trial and error other fragments (smaller or larger) may be found 
which are functional in altering the characteristics of the plant. It is also known that 
untranslated portions of sequence can suffice to inhibit expression of the homologous gene 
- coding portions may be present within the introduced sequence, but they do not appear 
to be essential under all circumstances. 

The inventors have discovered that there are at least two class A SBE genes in cassava. 
A fragment of a second gene has been isolated, which fragment directs the expression of 
the C terminal 481 amino acids of cassava class A SBE (see Figure 10) and comprises a 
3* untranslated region. Subsequently, a complete clone of the second gene was also 
recovered (see Figure 12). The coding portions of the two genes show some slight 
differences, and the second SBE gene may be considered as functionally equivalent to the 
corresponding portion of the nucleotide sequence shown in Figure 4. However, the 3" 
untranslated regions of the two genes show marked differences. Thus the method of 
altering a host cell may comprise the use of a sufficient portion of either gene so as to 
inhibit the expression of the naturally occurring homologous gene. Conveniently, a 
portion of nucleotide sequence is employed which is conserved between both genes. 
Alternatively, sufficient portions of both genes may be employed, typically using a single 
construct to direct the transcription of both introduced sequences. 

In addition, as explained above, it may be desired to cause inhibition of expression of the 
class B SBE (i.e. SBE I) in the same host cell, A number of class B SBE gene sequences 
are known, including portions of the cassava class B SBE (Salehuzzaman et aL, 1994 
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Plant Science 98, 53-62) and any one of these may prove suitable. Preferably the 
sequence used is that which derives from the host cell sought to be altered (e.g. when 
altering the characteristics of a cassava plant cell, it is generally preferred to use sense or 
anti-sense sequences corresponding exactly to at least portions of the cassava gene whose 
expression is sought to be inhibited). 

In a further aspect the invention provides an altered host cell, into which has been 
introduced a nucleic acid sequence comprising at least 200bp and exhibiting nt least 88% 
sequence identity (more preferably at least 90% , and most preferably at least 95% identity) 
with the corresponding region of the DNA sequence shown in Figures 4, 9, 10 or 13, 
operably linked in the sense or anti-sense orientation to a suitable promoter, said host cell 
comprising a natural gene sharing sequence homology with the introduced sequence. 

The host cell may be a micro-organism (such as a bacterial, fungal or yeast cell) or a plant 
cell. Conveniently the host ceil altered by the method is a cell of a cassava plant, or 
another plant with starch storage reserves, such as banana, potato, sweet potato, tomato, 
pea, wheat, barley, oat, maize, or rice plant. Typically the sequence will be introduced 
in a nucleic acid construct, by way of transformation, transduction, micro-injection or 
other method known to those skilled in the art. The invention also provides for a plant 
into which has been introduced a nucleic acid sequence of the invention, or the progeny 
of such a plant. 

The altered plant cell will preferably be grown into an altered plant, using techniques of 
plant growth and cultivation well-known to those skilled in the art of re-generating 
plantlets from plant cells. 

The invention also provides a method of obtaining starch from an altered plant, the plant 
being obtained by the method defined above. Starch may be extracted from the plant by 
any of the known techniques (e.g. milling). The invention further provides starch 
obtainable from a plant altered by the method defined above, the starch having altered 
properties compared to starch extracted from an equivalent but unaltered plant. 
Conveniently the altered starch is obtained from an altered plant selected from the group 
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consisting of cassava, potato, pea, tomato, maize, wheat, barley, oat, sweet potato and 
rice. Typically the altered starch will have increased amylose content. 

The invention will now he further described by way of illustrative examples and with 
reference to the accompanying drawings, in which:- 

Figure 1 is a schematic illustration of the cloning strategy for cassava SBE II. The top line 
represents the size of a full length clone with distances in kilobases (kb) and arrows 
representing oligonucleotides (rightward pointing arrows are sense strand, leftward are on 
opposite strand). The long thick arrow is the open reading frame with start and stop 
codons shown. Bel ow this are shown the 3* RACE, 5* RACE and PCR clones identified 
either by the plasmid name (shown in brackets above the line) or the clone number (shown 
to the left of the clone) for the 5' RACE only. Also shown (by an x) in the 5' RACE 
clones are positions of small deletions or introns. 

Figure 2 shows the DNA sequence and predicted ORF. of csbe2con.seq. This sequence 
is a consensus of 3' RACE pSJ94 and 5 1 RACE clones 27/9 ,11 and 28. The first 64 base 
pairs are derived from the RoRidTI? adaptor primer/dT tail followed by the SBE 
sequence. The one long open reading frame is shown in one letter code below the double 
strand DNA sequence. Also shown is the upstream ORF (MQL...LPW). 

Figure 3 shows an alignment of the 5' region of cassava SBE II csbe2con and pSJ99 
(clones 20 and 35) DNA sequences. Differences from the consensus sequence are shaded. 

Figure 4 shows the DNA sequence and predicted ORF ot full length cassava SBE II tuber 
cDNA in pSJ107. The sequence shown is from the CSBE214 to the CSBE218 
oligonucleotide. The DNA sequence is sequence ID No. 28 in the attached sequence 
listing; the amino acid sequence is Seq ID No. 29. 

Figure 5 shows an alignment of 3' region of cassava SBE II pSJ116 and 125 + 94 DNA 
sequences. The top line is the 125 4- 94 sequence and the bottom SJ116 sequence. 
Identical nucleotides are indicated by the same letter in the middle line, differences are 
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indicated by a gap, and dashed lines indicate gaps introduced to optimise alignment. 



Figure 6 shows an alignment of carboxy terminal region of pSJ116 and 1254-94 protein 
sequences. The top sequence is from 125 + 94 and the bottom from pSJ116, Identical 
amino acid residues are shown with the same letter, conserved changes with a colon and 
neutral changes with a period. 

Figure 7 shows a phylogenetic tree of starch branching enzyme proteins. The length of 
each pair of branches represents the distance between sequence pairs. The scale beneath 
the tree measures the distance between sequences (units indicate the number of substitution 
events). Dotted lines indicate a negative branch length because of averaging the tree. 
Zmconl2.pro is maize SBE II, psstbl.pro is pea SBE I (Bhattacharyya et al 1990 Cell 60, 
115-121) and atsbe2-l & 2-2. pro are two SBE II proteins from Arabidopsis thalania 
(Fisher et al 1996 Plant MoL Biol. 30, 97-108). SJ107.pro is representative of a cassava 
SBE II sequence, and potsbe2.pro is a potato SBE II sequence known to the inventors. 

Figure 8 is an alignment of SBE II proteins. Protein sequences are indicated in one letter 
code. The top line represents the consensus sequence, below which is shown the 
consensus ruler and the individual SBE II sequences. Residues matching the consensus 
are shaded. Dashes represent gaps introduced to optimise alignment. Sequence identities 
are shown at the right of the figure and are as Figure 7, except that SJ 107. pro is cassava 
SBE IL 

Figure 9 shows the DNA sequence and predicted ORF of a cassava SBE II cDNA isolated 
by 3' RACE (plasmid pSJ 101). 

Figure 10 shows the consensus DNA sequence and predicted ORF of a second cassava 
SBE II cDNA isolated by 3' and 5' RACE (sequence designated 125 + 94 is from plasmid 
pSJ125 and pSJ94, spiiced at the CSBE217 oligo sequence). 

Figure 11 is a schematic diagram of the plant transformation vector pSJ64. The black line 
represents the DNA sequence. The hashed line represents the bacterial plasmid backbone 
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(containing the origin of replication and bacterial selection marker) and is not shown in 
full. The filled triangles represent the T-DNA borders (RB = right border, LB = left 
border). Relevant restriction enzyme sites are shown above the black line with the 
approximate distances (in kiloobases) betwen sites marked by an asterisk shown 
underneath. The thinnest arrows represent polyadenylation signals (pAnos = nopaline 
synthase, pAg7 = Agrobacterium gene 7), the intermediate arrows represent protein 
coding regions (SBE II = cassava SBE II, HYG = hygromycin resistance gene) and the 
thick arrows represent promoter regions (P-2x35S = double CaMV 35S promoter, P-nos 
= nopaline synthase promoter). 

Figure 12 is a schematic illustration of the cloning strategy used to isolate a second 
cassava SBE II gene. The top line represents the size of a full length clone with distances 
in kilobases (kb) and arrows representing oligonucleotides (rightward pointing arrows are 
sense strand, leftward are on opposite strand). The long thick arrow is the open reading 
frame with start and stop codons shown. Below this are shown the 3' RACE, 5 1 RACE 
and PCR clones identified either by the plasmid name (shown in brackets above the line) 
or the clone number (shown to the right of the clone). 

Figure 13 shows the DNA sequence and predicted ORF of a second full length cassava 
SBE II tuber cDNA in pSJ146. Nucleotides 35-2760 are SBE II sequence and the 
remainder are from the pT7Blue vector. The DNA sequence of Figure 13 is Seq ID No. 
30, and the amino acid sequence is Seq ID No. 31, in the attached sequence listing. 

Example 1 

This example relates to the isolation and cloning of SBE II sequences from cassava. 
Recombinant DNA manipulations 

Standard procedures were performed essentially according to Sambrook et al. (1989 
Molecular cloning A laboratory manual, 2nd edn. Cold Spring Harbor Laboratory Press, 
Cold Spring Harbor. N. Y. ). DNA sequencing was performed on an ABI automated DNA 
sequencer and sequences manipulated using DNASTAR software for the Macintosh. 
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Rapid Amplification of cDNA ends (RACE) and PCR conditions 

5* and 3' RACE were performed essentially according to Frohman et al., (1988 Proc. 
Natl. Acad. Sci. USA 85, 8998-9002) but with the following modifications. 

For 3' RACE. 5 ug of total RNA was reverse transcribed using 5 pmol of the RACE 
adaptor .RoRidT 17 as primer and Stratascript RNAse H- reverse transcriptase (50 U) in 
a 50 jul reaction according to the manufacturer's instructions (Stratagene). The reaction 
was incubated for 1 hour at 37° C and then diluted to 200 u\ with TE (10 mM Tris HC1, 
1 miM EDTA) pH 8 and stored at 4°C. 2.5 u\ of this cDNA was used in a 25 fd PCR 
reaction with 12.5 pmol of SBE A and Ro primers for 30 cycles of 94°C 45 sec, 50°C 
25 sec, 72°C i min 30 sec. A second round of PCR (25 cycles) was performed using 1 
u\ of this reaction as template in a 50 ul reaction under the same conditions. Amplified 
products were separated by agarose gel electrophoresis and cloned into the pT7Blue vector 
(Invitrogen). 

For the first round of 5' RACE, 5 ug of total leaf JIN A was reverse transcribed as 
described above using 10 pmol of the SBE II gene specific primer CSBE22. This primer 
was removed from the reaction by diluting to 500 fx\ with TE and centrifuging twice 
through a centricon 100 microconcentrator. The concentrated cDNA was then dA-tailed 
with 9U of terminal deoxynucleotide transferase and 50 uM dATP in a 20 //l reaction in 
buffer supplied by the manufacturer (BRL). The reaction was incubated for 10 min at 
37 C C and 5 min at 65°C and then diluted to 200 u\ with TE pH 8. PCR was performed 
in a 50 //I volume using 5/d of tailed cDNA, 2.5 pmol of RoRidT17 and 25 pmol of Ro 
and CSBE24 primers for 30 cycles of 94°C 45 sec, 55°C 25 sec, 72°C 3 min. Amplified 
products were separated on a l c A TAE agarose geL cut out, 200/d of TE was added and 
melted at 99°C for 10 min. Five u\ of this was re-amplified in a 50 u\ volume using 
CSBE25 and Ri as primers and 25 cycles of 94°C 45 sec, 55°C 25 sec, 72°C i min 30 
sec. Amplified fragments were separated on a 15 TAE agarose gel, purified on DEAE 
paper and cloned into pT7Blue. 

The second round of 5' RACE was performed using CSBE28 and 29 primers in the first 
and second round PCR reactions respectively using a new A-tailed cDNA library primed 



SUBSTITUTE SHEET (RULE 26) 



WO 98/20145 



PCT/GB97/03032 - 



13 

with CSBE27. 

A third round of 5 1 RACE was performed on the same CSBE27 primed cDNA . 
Repeat 3* RACE and PCR Cloning 

The 3; RACE library (RoRidTIT primed leaf RNA) was used as a template. The first PCR 
reaction was diluted 1:20 and 1 u\ was used in a 50 fi\ PCR reaction with SBE A and Ri 
primers and the products were cloned into pT7Blue. The cloned PCR products were 
screened for the presence or absence of the CSBE23 oligo by colony PCR. 

A full length cDNA of cassava SBE II was isolated by PCR from leaf or root cDNA 
(RoRidT17 primed) using primers CSBE214 and CSBE218 from 2.5 u\ of cDNA in a 25 
ft\ reaction and 30 cycles of 94°C 45 sec, 55°C 25 sec, 72°C 2 min. 

Complementation of E> coli mutant KV832 

SBE II containing plasmids were transformed into the branching enzyme deficient mutant 
E. coli KV832 (Keil et aL, 1987 Mol. Gen. Genet. 207, 294-301) and cells grown on 
solid PYG media (0.85 % KH.PO,, 1.1 % K 2 HP0 4 , 0.6 % yeast extract) containing 1.0 
% glucose. To test for complementation, a loop of cells was scraped off and resuspended 
in 150 //L water to which was added 15 /iL of LugoFs solution (2 g KI and 1 g I 2 per 300 
ml water). 

RNA isolation 

RNA was isolated from cassava plants by the method of Logemann (1987 Anal. Biochem. 
163. 21-26). Leaf RNA was isolated from 0.5 gm of in vitro grown plant tissue. The 
total yield was 300 ug. Three month old roots (88 gm) were used for isolation of root 
RNA). 

SBE II specific oligonucleotides 

SBE A ATGGACAAGGATATGTATGA (Seq ID No. 1) 

CSBE21 GGTTTCATGACTTCTGAGCA (Seq ID No. 2 ) 
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CSBE22 


TGCTCAGAAGTCATGAAACC 


(Seq ID No. 


3) 


CSBE23 


TCC AGTCTC AAT AT AC GTCG 


(Seq ID No. 


4) 


CSBE24 


AGGAGTAGATGGTCTGTCGA 


(Seq ID No. 


5) 


CSBE25 


TCATACATATCCTTGTCCAT 


(Seq ID No. 


6) 


CSBE26 


GGGTG ACTTC AATG ATGT AC 


(Seq ID No. 


7) 


CSBE27 


GGTGTACATCATTGAAGTCA 


(Seq ID No. 


8) 


CSBE28 


AATTACTGGCTCCGTACTAC 


(Seq ID No. 


9) 


CSBE29 


CATTCCAACGTGCGACTCAT 


(Seq ID No. 


10) 


CSBE210 


TACCGGTAATCTAGGTGTTG 


(Seq ID No. 


11) 


CSBE211 


GGACCTTGGTTTAGATCCAA 


(Seq ID No. 


12) 


CSBE212 


ATGAGTCGCACGTTGGAATG 


(Seq ID No. 


13) 


CSBE213 


CAACACCTAGATTACCGGTA 


(Seq ID No. 


14) 


CSBE214 


TTAGTTGCGTCAGTTCTCAC 


(Seq ID No. 


15) 


CSBE2I5 


AATATCTATCTCAGCCGGAG 


(Seq ID No. 


16) 


CSBE216 


ATCTT AG AT AGTCTG CATC A 


(Seq ID No. 


17) 




1 Uu 1 lul 1 LLL 1 Livjr AA 1 i AL 


(oeq ID iNo. 




CSBE218 


TGCAAGGACCGTGACATCAA 


(Seq ID No. 


19) 


RESULTS 








Clonina of 


a SBE II gene from cassava leaf 







The strategy for cloning a full length cDNA of starch branching enzyme II of cassava is 
shown in Figure 1. A comparison of several SBE II (class A) SBE DNA sequences 
identified a 23 hp region which appears to he completely conserved among most genes 
(data not shown) and is positioned about one kilobase upstream from the 3' end of the 
gene. An oligonucleotide primer (designated SBE A) was made to this sequence and used 
to isolate a partial cDNA clone by 3' RACE PCR from first strand leaf cDNA as 
illustrated in Figure 1. An approximately 1100 bp band was amplified, cloned into 
pT7Blue vector and sequenced. This clone was designated pSJ94 and contained a 1120 
bp insert starting with the SBE A oligo and ending with a poly A tail. There was a 
predicted open reading frame of 235 amino acids which was highly homologous (799£ 
identical) to a potato SBE II also isolated by the inventors (data not shown) suggesting that 
this clone represented a class A (SBE II) gene. 
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To obtain the sequence of a full length clone nested primers were made complementary 
to the 5' end of this sequence and used in 5' RACE PCR to isolate clones from the 5' 
region of the gene. A total of three rounds of 5 ? RACE was needed to determine the 
sequence of the complete gene (i.e. one that has a predicted long ORE preceded by stop 
codons). It should be noted that during this cloning process several clones (# 23, 9, 16) 
were obtained that had small deletions and in one case (clone 23) there was also a small 
(120 bp) intron present. These occurrences are not uncommon and probably arise through 
errors in the PCR process and/or reverse transcription of incompletely processed RNA 
(heterogeneous nuclear RNA). 

The overlapping cDNA fragments could be assembled into a contiguous 3 kb sequence 
(designated csbe2con.seq) which contained one long predicted ORF as shown in Figure 
2. Several clones in the last round of 5' RACE were obtained which included sequence 
of the untranslated leader (UTL). All of these clones had an ORF (42 amino acids) 46 bp 
upstream and out of frame with that of the long ORP. 

There is more than one SBE II gene in cassava 

In order to determine if the assembled sequence represented that of a single gene, attempts 
were made to recover by PCR a full length SBE II gene using primers CSBE214 and 
CSBE23 at the 5' and 3' ends of the csbe2con sequence respectively. All attempts were 
unsuccessful using either leaf or root cDNA as template. The PCR was therefore repeated 
with either the 5'- or 3'- most primer and complementary primers along the length of the 
SBE II gene to determine the size of the largest fragment that could be amplified. With 
the CSBE214 primer, fragments could be amplified using primers 210, 28, 27 and 22 in 
order of increasing distance, the latter primer pair amplifying a 2.2 kb band. With the 3^ 
primer CSBE23, only primer pairs with 21 and 26 gave amplification products, the latter 
being about 1200 bp. These results suggest that the original 3' RACE clone (pSJ94) is 
derived from a different SBE II gene than the rest of the 5' RACE clones even though the 
two largest PCR fragments (214 + 22 and 26 4-23) overlap by 750 bp and share several 
primer sites. It is likely that the sequence of the two genes starts to diverge around the 
CSBE22 primer site such that the 3* end of the corresponding gene does not contain the 
23 primer and is not therefore able to amplify a cDNA when used with the 214 primer. 
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To confirm this, the sequence of the longest 5* PCR fragment (214 + 22) from two clones 
(#20 designated pSJ99. & #35) was determined and compared to the consensus sequence 
csbe2con as shown in Figure 3. The first 2000 bases are nearly identical (the single base 
changes might well be PCR errors), however the consensus sequence is significantly 
different after this. This region corresponds to the original 3 1 RACE fragment pSJ94 
(SBE A. + Ri adaptor) and provided evidence that there may be more than one SBE II 
gene in cassava. 

The 3 1 end corresponding to pSJ99 was therefore cloned as follows: 3' RACE PCR was 
performed on leaf cDNA using the SBE A oligo as the gene specific primer so that all 
SBE II genes would be amplified. The cloned DNA fragments were then screened for the 
presence or absence of the CSBE23 primer by PCR. Two out of 15 clones were positive 
with the SBE A + Ri primer pair but negative with SBE A + CSBE23 primers. The 
sequence of these two clones (designated pSJIOl , as shown in Figure 9) demonstrated that 
they were indeed from an SBE II gene and that they were different from pSJ94. However 
the overlapping region of pSJIOl (the 3' clone) and pSJ99 (the 5' clone) was identical 
suggesting that they were derived from the same gene. 

To confirm this a primer (CSBE218) was made to a region in the 3' UTR (untranslated 
region) of pSJIOl and used in combination with CSBE214 primer to recover by PCR a full 
length cDNA from both leaf and root cDNA. These clones were sequenced and 
designated pSJ106 & pSJ107 respectively. The sequence and predicted ORF of pSJ107 
is shown in Figure 4. The long ORF in plasmid pSJ106 was found to be interrupted by 
a stop codon (presumably introduced in the PCR process) approximately 1 kb from the 3' 
end of the gene, therefore another cDNA clone (designated pSJ116) was amplified in a 
separate reaction, cloned and sequenced. This clone had an intact ORF (data not shown). 
There were only a lew differences in these two sequences (in the transit peptide aa 27- 41: 
YRRTSSCLSFNFKEA to DRRTSSCLSFIFKKAA and L831 in pSJ107 to V in pSJ116 
respectively). 

An additional 740bp of sequence of the gene corresponding to the pSJ94 clone was 
isolated by 5" RACE using the primers CSBE216 and 217, and was designated pSJ125. 
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This sequence was combined with that of pSJ94 to form a consensus sequence "125 + 
94", as shown in Figure 10. The sequence of this second gene is about 90% identical at 
the DNA and protein level to pSJ116. as shown in Figure 5 and 6, and is clearly a second 
form of SBE II in cassava. The 3' untranslated regions of the two genes are not relatecT 
(data not shown). 

It was also determined that the full length cassava SBE II genes (from both leaf and tuber) 
actually encode for active starch branching enzymes since the cloned genes were able to 
complement the glycogen branching enzyme deficient E. coli mutant KV832. 

Main Findings 

1) A full length cDNA clone of a starch branching enzyme II (SBE II) gene has been 
cloned from leaves and starch storing roots of cassava. This cDNA encodes a 836 amino 
acid protein (Mr 95 Kd) and is 86 7o identical to pea SBE I over the central conserved 
domain, although the level of sequence identity over the entire coding region is lower than 
86%. 

2) There is more than one SBE II gene in cassava as a second partial SBE II cDNA was 
isolated which differs slightly in the protein coding region from the first gene and has no 
homology in the 3' untranslated region. 

3) The isolated full length cDNA from both leaves and roots encodes an active SBE as 
it complements an E. coll mutant deficient in glycogen branching enzyme as assayed by 
iodine staining. 

We have shown that there are SBE II (Class A) gene sequences present in the cassava 
genome by isolating cDNA fragments using 3' and 5' RACE. From these cDNA 
fragments a consensus sequence of over 3 kb could be compiled which contained one long 
open reading frame (Figure 2) which is highly homologous to other SBE II {class A) genes 
(data not shown). It is likely that the consensus sequence does not represent that of a 
single gene since attempts to PCR a full length gene using primers at the 5' and 3' ends 
of this sequence were not successful. In fact screening of a number of leaf derived 3' 
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RACE cDNAs showed that a second SBE II gene (cione designated pSJIOl) was also 
expressed which is highly homologous within the coding region to the originally isolated 
cDNA (pSJ94) but has a different 3' UTR. A full length SBE II gene was isolated from 
leaves and roots by PCR using a new primer to the 3' end of this sequence and the 
original sequence at the 5' end of the consensus sequence. If the frequency of clones 
isolated by 3' RACE PCR reflects the abundance of the mRNA levels then this full length 
gene may be expressed at lower levels in the leaf than the pSJ94 clone (2 out of 15 were 
the former class, 13/15 the latter). It should be noted that each class is expressed in both 
leaves and roots as judged by PCR (data not shown). Sequence analysis of the predicted 
ORF of the leaf and root genes showed only a few differences (4 amino acid changes and 
one deletion) which could have arisen through PCR errors or, alternatively, there may be 
more than one nearly identical gene expressed in these tissues. 

A comparison of all known SBE II protein sequences shows that the cassava SBE II gene 
is most closely related to the pea gene (Figure 8). The two proteins are 86.3% identical 
over a 686 amino acid range which extends from the triple proline "elbow" (Burton et aL, 
1995 Plant J. 7, 3-15) to the conserved WYA sequence immediately preceding the C- 
terminal extensions (data not shown). All SBE II proteins are conserved over this range 
in that they are at least 80% similar to each other. Remarkably however, the sequence 
conservation between the pea, potato and cassava SBE II proteins also extends to the N- 
terminal transit peptide, especially the first 12 amino acids of the precursor protein and 
the region surrounding the mature terminus of the pea protein (AKFSRDS). Because the 
proteins are so similar around this region it can be predicted that the mature terminus of 
the cassava SBE II protein is likely to be GKSSHES. The precursor has a predicted 
molecular mass of 96 kD and the mature protein a predicted molecule mass of 91.3 kD. 
The ; cassava SBE II has a short acidic tail at the C-terminal although this is not as long or 
as acidic as that found in the pea or potato proteins. The significance of this acidic tail, 
if any, remains to be determined. One notable difference between the amino acid 
sequence of cassava SBE II and all other SBE II proteins is the presence of the sequence 
NSKH at around position 697 instead of the conserved sequence DAD/EY. Although this 
conserved region forms part of a predicted «-helix (number 8) of the catalytic (R/a) s barrel 
domain (Burton et al 1995 cited previously), this difference does not abolish the SBE 
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activity of the cassava protein as this gene can still complement the glycogen branching 
deletion mutant of E. coli. It may however affect the specificity of the protein. An 
interesting point is that the other cassava SBE II clone pSJ94 has the conserved sequence 
DADY. 

One other point of interest concerning the sequence of the SBE II gene is the presence of 
an upstream ATG in the 5' UTR. This ATG could initiate a small peptide of 42 amino 
acids which would terminate downstream of the predicted initiating methionine codon of 
the SBE II precursor. If this does occur then the translation of the SBE II protein from this 
mRNA is likely to be inefficient as ribosomes normally initiate at the 5' most ATG in the 
mRNA. However the first ATG is in a poorer Kozak context than the SBE II initiator and 
it may be too close to the 5' end of the message to initiate efficiently (14 nucleotides) thus 
allowing initiation to occur at the correct ATG. 

In conclusion we have shown that cassava does have SBE II gene sequences, that they are 
expressed in both leaves and tubers and that more than one gene exists. 

Example 2 

Cloning of a second full length cassava SBE II gene 



Methods 
Oligonucleotides 



CSBE219 


CTTTATCTATTAAAGACTTC 


(Seq 


ID 


No. 


20) 


CSBE220 


CAAAAAAGTTTGTGACATGG 


(Seq 


ID 


No. 


21) 


CSBE221 


TCACI'l TiTCCAATGCTAAT 


(Seq 


ID 


No. 


22) 


CSBE222 


TCTCATGCAATGGAACCGAC 


(Seq 


ID 


No. 


23) 


CSBE223 


CAGATGTCCTGACTCGGAAT 


(Seq 


ID 


No. 24) 


CSBE224 


ATTCCGAGTCAGGACATCTG 


(Seq 


ID 


No. 


25) 


CSBE225 


CGCATTTCTCGCTATTGCTT 


(Seq 


ID 


No. 


26) 


CSBE226 


CACAGGCCCAAGTGAAGAAT 


(Seq 


ID 


No. 


27) 



The 5' end of the gene corresponding to the 3'RACE clone pSJ94 was isolated in three 
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rounds of 5'RACE. Prior to performing the first round of 5' RACE, 5 jag of total leaf 
RNA was reverse transcribed in a 20 /xl reaction using conditions as decribed by the 
manufacturer (Superscript enzyme, BRL) and 10 pmol of the SBE II gene specific primer 
CSBE23. Primers were then removed and the cDNA tailed with dATP as described 
above. The first round of 5'RACE used primers CSBE216 and Ro. This PCR reaction 
was diluted 1:20 and used as a template for a second round of amplification using primers 
CSBE217 and Ri. The gene specific primers were designed so that they would 
preferentially hybridise to the SBE II sequence in pSJ94. Amplified products appeared 
as a smear of approximately 600-1200 bp when subjected to electrophoresis on a 1 % TAE 
agarose gel. 

This smear was excised and DNA purified using a Qiaquick column (Qiagen) before 
ligation to the pT7Blue vector. Several clones were sequenced and clone #7 was 
designated pSJ125. New primers (CSBE219 and 220) were designed to hybridise to the 
5' end of pSJ125 and a second round of 5'RACE was performed using the same CSBE23 
primed library. Two fragments of 600 and 800 bp were cloned and sequenced (clones 
13,17). Primers CSBE221 and 222 were designed to hybridise to the 5' sequence of the 
longest clone (#13) and a third round of 5' RACE was performed on a new library (5 /xg 
total leaf RNA reverse transcribed with Superscript using CSBE220 as primer and then 
dATP tailed with TdT from Boehringer Mannheim). Fragments of approximately 500 bp 
were amplified, cloned and sequenced. Clone #13, was designated pSJ143. The process 
is illustrated schematically in Figure 12. 

To isolate a full length gene as a contiguous sequence, a new primer (CSBE225) was 
designed to hybridise to the 5' end of clone pSJ143 and used with one of the primers 
(CSBE226 or 23) in the 3' end of clone pSJ94, in a PCR reaction using RoRidT17 primed 
leaf cDNA as template. Use of primer CSBE226 resulted in production of Clone #2 
(designated pSJ144), and use of primer CSBE23 resulted in production of Clones #10 and 
13 (designated pSJ145 and pSJ146 respectively). Only pSJ146 was sequenced fully. 

Results 

Isolation of a second full length cassava SBE II gene 
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A full length clone for a second SBE II gene was isolated by extending the sequence of 
pSJ94 in three rounds of 5' RACE as illustrated schematically in Figure 12. In each 
round of 5' RACE, primers were designed that would preferentially hybridise to the new 
sequence rather than to the gene represented by pSJ116. In the final round of 5' RACE, 
three clones were obtained that had the initiating methione codon, and none of these had 
upstream ATGs. The overlapping cDNA fragments (sequences of the 5'RACE clones 
pSJ143 7 13, pSJ125 and the 3'RACE clone pSJ94) could be assembled into a consensus 
sequence of approximately 3 kb which was designated csbe2-2.seq. This sequence 
contained one long ORF with a predicted size of 848 aa (M r 97 kDa). The full length 
gene was then isolated as a contiguous sequence by PCR amplification from RoRidT17 
primed leaf cDNA using primers at the 5' (CSBE225) and 3' (CSBE23 or CSBE226) ends 
of the RACE clones. One clone, designated pSJ146, was sequenced and the restriction 
map is shown along with the predicted amino acid sequence in Figure 13. 

Sequence homologies between SBE II genes 

The two cassava genes (pSJ116 and pSJ146) share 88.8% identity at the DNA level over 
the entire coding region (data not shown). The homology extends about 50 bases outside 
of this region but beyond this the untranslated regions show no similarity (data not 
shown). At the protein level the two genes show 86% identity over the entire ORF (data 
not shown). The two genes are more closely related to each other than to any other SBE 
II. Between species, the pea SBE I shows the most homology to the cassava SBE II 
genes. 

Example 3 

Construction of plant transformation vectors and transformation of cassava with 
antisense starch branching enzyme genes. 

This example describes in detail how a portion of the SBE II gene isolated from cassava 
may be introduced into cassava plants to create transgenic plants with altered properties. 

An 1100 bp Hind III - Sac I fragment of cassava SBE II (from plasmid pSJ94) was cloned 
into the Hind III - Sac I sites of the plant transformation vector pSJ64 (Figure 11). This 
placed the SBE II gene in an antisense orientation between the 2X 35S CaMV promoter 
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and the nopaline synthase polyadenylation signal. pSJ64 is a derivative of the binary 
vector pGPTV-HYG (Becker et al., 1992 Plant Molecular Biology 20: 1195-1197) 
modified by inclusion of an approximately 750 bp fragment of pJIT60 (Guerineau et al 
1992 Plant MoL Biol. 18, 815-818) containing the duplicated cauliflower mosaic virus 
(CaMV) 35S promoter (Cabb-JI strain, equivalent to nucleotides 7040 to 7376 duplicated 
upstream of 7040 to 7433, as described by Frank et aL, 1980 Cell 21, 285-294) to replace 
the GUS coding sequence. A similar construct was made with the cassava SBE II 
sequence from plasmid pSJIOl. 

These plasmids are then introduced into Agrobacterium tumefaciens LBA4404 by a direct 
DNA uptake method (An et aL Binary vectors, In: Plant Molecular Biology Manual (ed 
Galvin and Schilperoort) AD 1988 pp 1-19) and can be used to transform cassava somatic 
embryos by selecting on hygromycin as described by Li et aL (1996, Nature 
Biotechnology 14, 736-740). 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: 

(A) NAME: National Starch and Chemical Investment 

Holding Corporation 

(B) STREET: Suite 27. 501 Silverside Road 

(C) CITY: Wilmington 

(D) STATE: Delaware 

(E) COUNTRY: USA 

(F) POSTAL CODE (ZIP): 19809 

(ii) TITLE OF INVENTION: Improvements in or Relating to Starch 

Content of Plants 

(iii) NUMBER OF SEQUENCES: 31 

(iv) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC -DOS/MS -DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.30 (EPO) 



(2) INFORMATION FOR SEQ ID NO: 1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH. 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 
ATGGACAAGG ATATGTATGA 20 



(2) INFORMATION FOR SEQ ID NO: 2: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 
GGTTTCATGA CTTCTGAGCA " 20 



(2) INFORMATION FOR SEQ ID NO: 3: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
TGCTCAGAAG TCATGAAACC 20 



(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY; linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
TCCAGTCTCA ATATACGTCG 20 



(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
AGGAGTAGAT GGTCTGTCGA 20 



(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
TCATACATAT CCTTGTCCAT 20 



(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
GGGTGACTTC AATGATGTAC 20 

(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
GGTGTACATC ATTGAAGTCA 20 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
AATTACTGGC TCCGTACTAC 20 

(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
CATTCCAACG TGCGACTCAT 20 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
TACCGGTAAT CTAGGTGTTG 20 
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(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
GGACCTTGGT TTAGATCCAA 20 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
ATGAGTCGCA CGTTGGAATG 20 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
CAACACCTAG ATTACCGGTA 20 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
TTAGTTGCGT CAGTTCTCAC - 20 

(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
AATATCTATC TCAGCCGGAG 20 



(2) 'INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
ATCTTAGATA GTCTGCATCA 20 



(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 
TGGTTGTTCC CTGGAATTAC 20 



(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
TGCAAGGACC GTGACATCAA 20 



(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
CTTTATCTAT TAAAGACTTC 



(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 
CAAAAAAGTT TGTGACATGG 



(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 
TCACTTTTTC CAATGCTAAT 



(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
TCTCATGCAA TGGAACCGAC 



(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 
CAGATGTCCT GACTCGGAAT 
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(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANOEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 
ATTCCGAGTC AGGACATCTG 20 



(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 
CGCATTTCTC GCTATTGCTT 20 



(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27- 
CACAGGCCCA AGTGAAGAAT 20 



(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2588 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 21. .2531 

(xi ) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 

CTCTCTAACT TCTCAGCGAA ATG GGA CAC TAC ACC ATA TCA GGA ATA CGT 50 

Met Gly His Tyr Thr He Ser Gly He Arg 
1 5 10 
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TTT CCT TGT GCT CCA CTC TGC AAA TCT CAA TCT ACC GGC TTC CAT GGC 98 

Phe Pro Cys Ala Pro Leu Cys Lys Ser Gin Ser Thr Gly Phe His Gly 
15 20 25 

TAT CGG AGG ACC TCC TCT TGC CTT TCC TTC AAC TTC AAG GAG GCG TTT 146 
Tyr Arg Arg Thr Ser Ser Cys Leu Ser Phe Asn Phe Lys Glu Ala Phe 
30 35 40 

TCT AGG AGG GTC TTC TCT GGA AAG TCA TCT CAT GAA TCT GAC TCC TCA 194 
Ser*Arg Arg Val Phe Ser Gly Lys Ser Ser His Glu Ser Asp Ser Ser 
45 50 55 

AAT GTA ATG GTC ACT GCT TCT AAA AGA GTC CTT CCT GAT GGT CGG ATT 242 
Asn Val Met Val Thr Ala Ser Lys Arg Val Leu Pro Asp Gly Arg He 
60 65 70 

GAA TGC TAT TCT TCT TCA ACA GAT CAA TTG GAA GCC CCT GGC ACA GTT 290 
Glu Cys Tyr Ser Ser Ser Thr Asp Gin Leu Glu Ala Pro Gly Thr Val 
75 80 85 90 

TCA GAA GAA TCC CAG GTG CTT ACT GAT GTT GAG AGT CTC ATT ATG GAT 338 
Ser Glu Glu Ser Gin Val Leu Thr Asp Val Glu Ser Leu He Met Asp 
95 100 105 

GAT AAG ATT GTT GAA GAT GAA GTA AAT AAA GAA TCT GTT CCA ATG CGG 386 
Asp Lys He Val Glu Asp Glu Val Asn Lys Glu Ser Val Pro Met Arg 
110 115 120 

GAG ACA GTT AGC ATC AGA AAA ATT GGA TCT AAA CCA AGG TCC ATT CCT 434 
Glu Thr Val Ser He Arg Lys He Gly Ser Lys Pro Arg Ser He Pro 
125 . 130 135 

CCA CCC GGC AGA GGG CAA AGA ATA TAT GAC ATA GAT CCA AGC TTG ACA 482 
Pro Pro Gly Arg Gly Gin Arg He Tyr Asp He Asp Pro Ser Leu Thr 
140 145 150 

GGC TTT CGT CAA CAC CTA GAT TAC CGG TAT TCA CAG TAC AAA AGA CTC 530 
Gly Phe Arg Gin His Leu Asp Tyr Arg Tyr Ser Gin Tyr Lys Arg Leu 
155 160 165 170 

CGA GAA GAA ATT GAC AAG TAT GAA GGT AGT CTG GAT GCA TTT TCT CGT 578 
Arg Glu Glu He Asp Lys Tyr Glu Gly Ser Leu Asp Ala Phe Ser Arg 
175 180 185 

GGC TAT GAA AAG TTT GGT TTC TCA CGC AGT GAA ACA GGA ATA ACT T AT 626 
Gly Tyr Glu Lys Phe Gly Phe Ser Arg Ser Glu Thr Gly He Thr Tyr 
190 195 200 

AGA GAG TGG GCA CCA GGA GCT ACG TGG GCT GCA TTG ATT GGA GAT TTC 674 
Arg Glu Trp Ala Pro Gly Ala Thr Trp Ala Ala Leu lie Gly Asp Phe 
205 210 215 

AAT AAC TGG AAT CCT AAT GCA GAT GTC ATG ACT CAG AAT GAG TGT GGT 722 
Asn Asn Trp Asn Pro Asn Ala Asp Val Met Thr Gin Asn Glu Cys Gly 
220 225 230 
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GTC TGG GAG ATC TTT TTG CCG AAT AAT GCA GAT GGT TCA CCA CCA ATT 770 

Val Trp Glu lie Phe Leu Pro Asn Asn Ala Asp Gly Ser Pro Pro He 

235 240 245 250 

CCC CAT GGT TCT CGA GTA AAG ATA CGC ATG GAT ACT CCA TCT GGC AAC 818 
Pro His Gly Ser Arg Val Lys He Arg Met Asp Thr Pro Ser Gly Asn 
255 260 265 

AAA GAT TCT An CCT GCT TGG ATC AAG TTC TCA GTT CAA GCA CCA GGT 866 
LysiAsp Ser He Pro Ala Trp He Lys Phe Ser Val Gin Ala Pro Gly 
270 275 280 

GAA CTC CCA TAT AAT GGC ATA TAC TAT GAT CCT CCC GAG GAG GAG AAG 914 
Glu Leu Pro Tyr Asn Gly lie Tyr Tyr Asp Pro Pro Glu Glu Glu Lys 
285 290 295 

TAT GTG TTC AAA AAT CCT CAG CCA AAG AGA CCA AAA TCA CTT CGG ATT 962 
Tyr Val Phe Lys Asn Pro Gin Pro Lys Arg Pro Lys Ser Leu Arg He 
300 305 310 

TAT GAG TCG CAC GTT GGA ATG AGT AGT ACG GAG CCA GTA ATT AAC ACA 1010 
Tyr Glu Ser His Val Gly Met Ser Ser Thr Glu Pro Val He Asn Thr 
315 320 325 330 

TAT GCC AAC TTT AGA GAT GAT GTG CTT CCT CGC ATC AAA AAG CTT GGC 1058 
Tyr Ala Asn Phe Arg Asp Asp Val Leu Pro Arg He Lys Lys Leu Gly 
335 340 345 

TAC AAT GCT GTT CAG CTC ATG GCT ATT CAA GAG CAT TCA TAT TAT GCT 1106 
Tyr Asn Ala Val Gin Leu Met Ala He Gin Glu His Ser Tyr Tyr Ala 
350 355 360 

AGT TTT GGG TAT CAC GTC- ACA AAC TTT TAT GCA GCT AGC AGC CGA TTT 1154 
Ser Phe Gly Tyr His Val Thr Asn Phe Tyr Ala Ala Ser Ser Arg Phe 
365 370 375 

GGA ACT CCT GAT GAT TTA AAG TCT CTA ATA GAT AAA GCT CAC GAG TTA 1202 
Gly Thr Pro Asp Asp Leu Lys Ser Leu lie Asp Lys Ala His Glu Leu 
380 385 390 

GGT CTT CTT GTT CTC ATG GAT ATT GTT CAT AGC CAT GCA TCA ACT AAT 1250 
Gly Leu Leu Val Leu Met Asp He Val His Ser His Ala Ser Thr Asn 
395 400 405 410 

ACG TTG GAT GGG CTG AAT ATG TTT GAT GGT ACG GAT GGT CAC TAC TTT 1298 
Thr Leu Asp Gly Leu Asn Met Phe Asp Gly Thr Asp Gly His Tyr Phe 
415 420 425 

CAC TCT GGA CCA CGG GGT CAT CAT TGG ATG TGG GAC TCT CGC CTT TTC 1346 
His Ser Gly Pro Arg Gly His His Trp Met Trp Asp Ser Arg Leu Phe 
430 435 440 

AAC TAT GGG AGC TGG GAG GTT CTA AGG TTT CTT CTT TCA AAT GCA AGG 1394 
Asn Tyr Gly Ser Trp Glu Val Leu Arg Phe Leu Leu Ser Asn Ala Arg 
445 450 455 
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TGG TGG TTG GAT GAG TAC AAG TTT GAT GGG TTC AGA TTT GAT GGG GTG 
Trp Trp Leu Asp Glu Tyr Lys Phe Asp Gly Phe Arg Phe Asp Gly Val 
460 465 470 

ACT TCA ATG ATG TAC ACC CAT CAT GGA TTG CAG GTA GAT TTT ACC GGC 
Thr Ser Met Met Tyr Thr His His Gly Leu Gin Val Asp Phe Thr Gly 
475 480 485 490 

AAC TAC AAT GAA TAC TTT GGA TAT GCA ACT GAT GTA GAT GCT GTG GTT 
Asn Tyr Asn Glu Tyr Phe Gly Tyr Ala Thr Asp Val Asp Ala Val Val 
495 500 505 

TAT TTG ATG CTG TTG AAT GAT ATG ATT CAT GGT CTC TTC CCA GAG GCT 
Tyr Leu Met Leu Leu Asn Asp Met lie His Gly Leu Phe Pro Glu Ala 
510 515 520 

GTC ACC ATT GGT GAA GAT GTT AGT GGA ATG CCA ACA GTT TGC ATT CCG 
Val Thr He Gly Glu Asp Val Ser Gly Met Pro Thr Val Cys He Pro 
525 530 535 

GTT GAA GAT GGT GGT GTT GGC TTT GAT TAT CGT CTC CAC ATG GCT GTT 
Val Glu Asp Gly Gly Val Gly Phe Asp Tyr Arg Leu His Met Ala Val 
540 545 550 

GCT GAT AAA TGG GTT GAG ATT ATT CAG AAG AGA GAT GAA GAT TGG AAA 
Ala Asp Lys Trp Val Glu lie lie Gin Lys Arg Asp Glu Asp Trp Lys 
555 560 565 570 

ATG GGT GAC ATT GTA CAT ATG CTG ACC AAC AGG CGG TGG TTG GAA AAG 
Met Gly Asp He Val His Met Leu Thr Asn Arg Arg Trp Leu Glu Lys 
575 580 585 

TGT GTT TCT TAT GCT GAA AGT CAT GAC CAG GCC CTT GTT GGT GAC AAA 
Cys Val Ser Tyr Ala Glu Ser His Asp Gin Ala Leu Val Gly Asp Lys 
590 595 600 

ACT ATT GCA TTT TGG CTG ATG GAC AAG GAT ATG TAT GAC TTC ATG GCT 
Thr lie Ala Phe Trp Leu Met Asp Lys Asp Met Tyr Asp Phe Met Ala 
605 610 615 

CTT GAC AGA CCA TCT ACT CCT CTC ATA GAT CGT GGA GTA GCA TTG CAC 
Leu Asp Arg Pro Ser Thr Pro Leu He Asp Arg Gly Val Ala Leu His 
620 625 630 

AAA ATG ATC AGG CTT ATT ACC ATG GGA TTA GGC GGA GAA GGA TAT TTG 
Lys Met He Arg Leu He Thr Met Gly Leu Gly Gly Glu Gly Tyr Leu 
635 640 645 650 



AAT 



ATG GGA AAT GAA TTT GGA CAC CCC GAG TGG ATT GAT TTT CCA 



Asn Phe Met Gly Asn Glu Phe Gly His Pro Glu Trp He Asp Phe Pro 

655 660 665 

AGA GGT GAT CTA CAT CTT CCC AGT GGT AAA TTT GTT CCT GGG AAC AAT 

Arg Gly Asp Leu His Leu Pro Ser Gly Lys Phe Val Pro Gly Asn Asn 

670 675 680 
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TAC AGT TAT GAT AAA TGC CGG CGT AGG TTT GAT CTA GGC AAT TCA AAG 2114 
Tyr Ser Tyr Asp Lys Cys Arg Arg Arg Phe Asp Leu Gly Asn Ser Lys 
685 690 695 

CAT CTG AGA TAT CAT GGA ATG CAA GAG TTT GAT CAA GCA ATT CAG CAT 2162 
His Leu Arg Tyr His Gly Met Gin 61 u Phe Asp Gin Ala He Gin His 
700 705 710 

CTT GAA GAA GCC TAT GGT TTC ATG ACT TCT GAG CAC CAA TAC ATA TCA 2210 
Leu iGlu Glu Ala Tyr Gly Phe Met Thr Ser Glu His Gin Tyr He Ser 
715 720 725 730 

CGG AAG GAT GAA AGG GAT CGG ATC An GTC TTC GAG AGG GGA AAC CTC 2258 
Arg Lys Asp Glu Arg Asp Arg lie He Val Phe Glu Arg Gly Asn Leu 
735 740 745 

GTT TTT GTA TTC AAT TTT CAT TGG ACT AGC AGC TAT TCG GAT TAC CGA 2306 
Val Phe Val Phe Asn Phe His Trp Thr Ser Ser Tyr Ser Asp Tyr Arg 
750 755 760 

GTT GGC TGC TTA AAG CCA GGA AAG TAC AAG ATA GTC TTG GAT TCA GAT 2354 
Val Gly Cys Leu Lys Pro Gly Lys Tyr Lys He Val Leu Asp Ser Asp 
765 770 775 

GAT CCT TTG TTT GGA GGC TTT GGC AGG CTT AGT CAT GAT GCA GAG CAC 2402 
Asp Pro Leu Phe Gly Gly Phe Gly Arg Leu Ser His Asp Ala Glu His 
780 785 790 

TTC AGC TTT GAA GGG TGG TAC GAT AAC CGG CCT CGA TCC TTC ATG GTG 2450 
Phe Ser Phe Glu Gly Trp Tyr Asp Asn Arg Pro Arg Ser Phe Met Val 
795 800 805 810 

TAC ACA CCA TGT AGA ACA GCA GTG GTC TAT GCT TTA GTG GAG GAT GAA 2498 
Tyr Thr Pro Cys Arg Thr Ala Val Val Tyr Ala Leu Val Glu Asp Glu 
815 820 825 

GTG GAG AAT GAA TTG GAA CCT GTC GCC GGT TAA GATATATCTT AACAACAGGT 2551 
Val Glu Asn Glu Leu Glu Pro Val Ala Gly * 
830 835 

TCTGAAGCAG GAATGCCATT ATTGATCTTC CTATGTT 2588 



(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 837 amino acids 

(B) TYPE: ammo acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 

Met Gly His Tyr Thr He Ser Gly He Arg Phe Pro Cys Ala Pro Leu 
1 5 10 - 15 
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Cys Lys Ser Gin Ser Thr Gly Phe His Gly Tyr Arg Arg Thr Ser Ser 

20 25 30 

Cys Leu Ser Phe Asn Phe Lys Glu Ala Phe Ser Arg Arg Val Phe Ser 
35 40 45 

Gly Lys Ser Ser His Glu Ser Asp Ser Ser Asn Val Met Val Thr Ala 
50 55 60 

Ser iLys Arg Val Leu Pro Asp Gly Arg He Glu Cys Tyr Ser Ser Ser 
65 70 75 80 

Thr Asp Gin Leu Glu Ala Pro Gly Thr Val Ser Glu Glu Ser Gin Val 
85 90 95 

Leu Thr Asp Val Glu Ser Leu lie Met Asp Asp Lys He Val Glu Asp 
100 105 110 

Glu Val Asn Lys Glu Ser Val Pro Met Arg Glu Thr Val Ser He Arg 
115 120 125 

Lys He Gly Ser Lys Pro Arg Ser He Pro Pro Pro Gly Arg Gly Gin 
130 135 140 

Arg He Tyr Asp He Asp Pro Ser Leu Thr Gly Phe Arg Gin His Leu 
145 150 155 160 

Asp Tyr Arg Tyr Ser Gin Tyr Lys Arg Leu Arg Glu Glu He Asp Lys 
165 170 175 

Tyr Glu Gly Ser Leu Asp Ala Phe Ser Arg Gly Tyr Glu Lys Phe Gly 
180 185 190 

Phe Ser Arg Ser Glu Thr Gly He Thr Tyr Arg Glu Trp Ala Pro Gly 
195 200 205 

Ala Thr Trp Ala Ala Leu He Gly Asp Phe Asn Asn Trp Asn Pro Asn 
210 215 220 

Ala Asp Val Met Thr Gin Asn Glu Cys Gly Val Trp Glu He Phe Leu 
225 230 235 240 

Pro Asn Asn Ala Asp Gly Ser Pro Pro He Pro His Gly Ser Arg Val 
245 250 255 

Lys He Arg Met Asp Thr Pro Ser Gly Asn Lys Asp Ser He Pro Ala 
260 265 270 

Trp He Lys Phe Ser Val Gin Ala Pro Gly Glu Leu Pro Tyr Asn Gly 
275 280 285 

He Tyr Tyr Asp Pro Pro Glu Glu Glu Lys Tyr Val Phe Lys Asn Pro 
290 295 300 

Gin Pro Lys Arg Pro Lys Ser Leu Arg He Tyr Glu Ser His Val Gly 
305 310 315 320 
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Met Ser Ser Thr Glu Pro Val He Asn Thr Tyr Ala Asn Phe Arg Asp 
325 330 335 

Asp Val Leu Pro Arg He Lys Lys Leu Gly Tyr Asn Ala Val Gin Leu 
340 345 350 

Met Ala He Gin Glu His Ser Tyr Tyr Ala Ser Phe Gly Tyr His Val 
355 360 365 

Thr -.Asn Phe Tyr Ala Ala Ser Ser Arg Phe Gly Thr Pro Asp Asp Leu 
370 375 380 

Lys Ser Leu He Asp Lys Ala His Glu Leu Gly Leu Leu Val Leu Met 
385 390 395 400 

Asp-He Val His Ser His Ala Ser Thr Asn Thr Leu Asp Gly Leu Asn 
405 410 415 

Met Phe Asp Gly Thr Asp Gly His Tyr Phe His Ser Gly Pro Arg Gly 
420 425 430 

His His Trp Met Trp Asp Ser Arg Leu Phe Asn Tyr Gly Ser Trp Glu 
435 440 445 

Val Leu Arg Phe Leu Leu Ser Asn Ala Arg Trp Trp Leu Asp Glu Tyr 
450 455 460 

Lys Phe Asp Gly Phe Arg Phe Asp Gly Val Thr Ser Met Met Tyr Thr 
465 470 475 480 

His His Gly Leu Gin Val Asp Phe Thr Gly Asn Tyr Asn Glu Tyr Phe 
485 490 495 

Gly Tyr Ala Thr Asp Val Asp Ala Val Val Tyr Leu Met Leu Leu Asn 
500 505 510 

Asp Met He His Gly Leu Phe Pro Glu Ala Val Thr He Gly Glu Asp 
515 520 525 

Val Ser Gly Met Pro Thr Val Cys He Pro Val Glu Asp Gly Gly Val 
530 535 540 

Gly Phe Asp Tyr Arg Leu His Met Ala Val Ala Asp Lys Trp Val Glu 
545 550 555 560 

He He Gin Lys Arg Asp Glu Asp Trp Lys Met Gly Asp He Val His 
565 570 575 

Met Leu Thr Asn Arg Arg Trp Leu Glu Lys Cys Val Ser Tyr Ala Glu 
580 585 590 

Ser His Asp Gin Ala Leu Val Gly Asp Lys Thr He Ala Phe Trp Leu 
595 600 605 

Met Asp Lys Asd Met Tyr Asp Phe Met Ala Leu Asp Arg Pro Ser Thr 
610 ' 615 620 
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Pro Leu He Asp Arg Gly Val Ala Leu His Lys Met He Arg Leu He 
625 630 635 640 

Thr Met Gly Leu Gly Gly Glu Gly Tyr Leu Asn Phe Met Gly Asn Glu 
645 650 655 

Phe Gly His Pro Glu Trp He Asp Phe Pro Arg Gly Asp Leu His Leu 
660 665 670 

Proper Gly Lys Phe Val Pro Gly Asn Asn Tyr Ser Tyr Asp Lys Cys 
675 680 685 

Arg Arg Arg Phe Asp Leu Gly Asn Ser Lys His Leu Arg Tyr His Gly 
690 695 700 

Met Gin Glu Phe Asp Gin Ala He Gin His Leu Glu Glu Ala Tyr Gly 
705 710 715 720 

Phe Met Thr Ser Glu His Gin Tyr lie Ser Arg Lys Asp Glu Arg Asp 
725 730 735 

Arg He He Val Phe Glu Arg Gly Asn Leu Val Phe Val Phe Asn Phe 
740 745 750 

His Trp Thr Ser Ser Tyr Ser Asp Tyr Arg Val Gly Cys Leu Lys Pro 
755 760 765 

Gly Lys Tyr Lys He Val Leu Asp Ser Asp Asp Pro Leu Phe Gly Gly 
770 775 780 

Phe Gly Arg Leu Ser His Asp Ala Glu His Phe Ser Phe Glu Gly Trp 
785 790 795 800 

Tyr Asp Asn Arg Pro Arg Ser Phe Met Val Tyr Thr Pro Cys Arg Thr 
805 810 815 

Ala Val Val Tyr Ala Leu Val Glu Asp Glu Val Glu Asn Glu Leu Glu 
820 825 830 

Pro Val Ala Gly * 
835 



(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2805 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 131. .2677 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 
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AGTGAATTCG AGCTCGGTAC CCGGGGATCC GATTCGCATT TCTCGCTATT GCTTTCCGTT 60 

TATTTCCATA TATAAAATAT CAAATCTAAT CACTTGCGCC ATTTCTATCT CTCTCCAAAC 120 

TCTCACCGAA ATG GTA TAC TAC ACT GTA TCA GGC ATA CGT TTT CCT TGT 169 
Met Val Tyr Tyr Thr Val Ser Gly He Arg Phe Pro Cys 
840 845 850 

GCA CCT TCA CTC TAC AAA TCT CAG CTC ACC AGC TTC CAT GGC GGT CGA 217 
Ala ;Pro Ser Leu Tyr Lys Ser Gin Leu Thr Ser Phe His Gly Gly Arg 
855 860 865 

AGG ACC TCT TCT GGC CTT TCC TTC CTC TTG AAG AAG GAG CTG TTT CCT 265 
Arg Thr Ser Ser Gly Leu Ser Phe Leu Leu Lys Lys Glu Leu Phe Pro 
870 875 880 

CGG AAG ATC TTT GCT GGA AAG TCC TCT TAT GAA TCT GAC TCC TCA AAT 313 
Arg Lys He Phe Ala Gly Lys Ser Ser Tyr Glu Ser Asp Ser Ser Asn 
885 890 895 

TTA ACT GTC TCT GCA TCT GAG AAG GTC CTT GTT CCT GAT GAT CAG ATT 361 
Leu Thr Val Ser Ala Ser Glu Lys Val Leu Val Pro Asp Asp Gin He 
900 905 910 

GAT GGC TCT TCT TCT TCA ACA TAT CAA TTA GAA ACC ACT GGC ACA GTT 409 
Asp Gly Ser Ser Ser Ser Thr Tyr Gin Leu Glu Thr Thr Gly Thr Val 
915 920 925 930 

TTG GAG GAA TCC CAG GTT CTT GGT GAT GCA GAG AGT CTT GTG ATG GAA 457 
Leu Glu Glu Ser Gin Val Leu Gly Asp Ala Glu Ser Leu Val Met Glu 
935 940 945 

GAT GAT AAG AAT GTT GAG GAG GAT GAA GTA AAA AAA GAG TCG GTT CCA 505 
Asp Asp Lys Asn Val Glu Glu Asp Glu Val Lys Lys Glu Ser Val Pro 
950 955 960 

TTG CAT GAG ACA ATT AGC ATT GGA AAA AGT GAA TCT AAA CCA AGG TCC 553 
Leu His Glu Thr He Ser He Gly Lys Ser Glu Ser Lys Pro Arg Ser 
965 970 975 

ATT CCT CCA CCT GGC AGT GGG CAG AGA ATA TAT GAC ATA GAF CCA AGC 601 
lie Pro Pro Pro Gly Ser Gly Gin Arg He Tyr Asp He Asp Pro Ser 
980 985 990 

TTG GCA GGT TTC CGT CAG CAT CTT GAC TAC CGA TAT TCA CAG TAC AAA 649 
Leu Ala Gly Phe Arg Gin His Leu Asp Tyr Arg Tyr Ser Gin Tyr Lys 
995 1000 1005 1010 

AGG CTG CGT GAG GAA ATT GAC AAG TAT GAA GGT GGT TTG GAT GCA TTC 697 
Arg Leu Arg Glu Glu He Asp Lys Tyr Glu Gly Gly Leu Asp Ala Phe 
1015 1020- 1025 

TCT CGT GGA TTT GAA AAG TTT GGT TTC TTA CGC AGT GAA ACA GGA ATA 745 
Ser Arg Gly Phe Glu Lys Phe Gly Phe Leu Arg Ser Glu Thr Gly He 
1030 1035 1040 
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ACT TAT AGG GM TGG GCA CCT GGA GCT ACG TGG GCT GCA CTT ATT GGA 

Thr Tyr Arg Glu Trp Ala Pro Gly Ala Thr Trp Ala Ala Leu He Gly 
1045 1050 1055 

GAT TTC AAC AAT TGG AAT CCT AAT GCA GAT GTC ATG ACT CGG AAT GAG 

Asp Phe Asn Asn Trp Asn Pro Asn Ala Asp Val Met Thr Arg Asn Glu 

1060 1065 1070 



TTT GGT GTC TGG GAG ATT TTT TTG CCA AAT AAC GCA GAT GGT TCA CCA 
Phe, Gly Val Trp Glu He Phe Leu Pro Asn Asn Ala Asp Gly Ser Pro 
1075 1080 1085 1090 

CCA ATT CCT CAT GGT TCT CGA GTA AAG ATA CGC ATG GAT ACT CCA TCT 
Pro He Pro His Gly Ser Arg Val Lys He Arg Met Asp Thr Pro Ser 
1095 1100 1105 

GGC ATC AAA GAT TCA ATT CCT GCT TGG ATC AAG TTC TCA GTT CAG GCA 
Gly He Lys Asp Ser He Pro Ala Trp He Lys Phe Ser Val Gin Ala 
1110 1115 1120 

CCT GGT GAA ATC CCA TAC AAT GCC ATA TAC TAT GAT CCA CCA AAG GAG 
Pro Gly Glu He Pro Tyr Asn Ala He Tyr Tyr Asp Pro Pro Lys Glu 
1125 1130 1135 

GAG AAG TAT GTG TTC AAA CAT CCT CAG CCA AAG AGA CCA AAA TCA CTT 
Glu Lys Tyr Val Phe Lys His Pro Gin Pro Lys Arg Pro Lys Ser Leu 
1140 1145 1150 

AGG ATT TAT GAA TCT CAT GTT GGG ATG AGT AGT ATG GAG CCA ATA ATT 
Arg He Tyr Glu Ser His Val Gly Met Ser Ser Met Glu Pro He lie 
1155 1160 1165 1170 

AAC ACA TAT GCC AAC TTT AGA GAT GAT ATG CTT CCT CGC ATC AAA AAG 
Asn Thr Tyr Ala Asn Phe Arg Asp Asp Met Leu Pro Arg He Lys Lys 
1175 1180 1185 

CTT GGC TAC AAT GCT GTT CAG ATC ATG GCT ATT CAA GAG CAT TCC TAT 
Leu Gly Tyr Asn Ala Val Gin lie Met Ala He Gin Glu His Ser Tyr 
1190 1195 1200 



793 



841 



889 



937 



985 



1033 



1081 



1129 



1177 



1225 



TAT GCT AGT TTT GGG TAC CAT GTC ACA AAC TTT TTT GCA CCT AGC AGC 
Tyr Ala Ser Phe Gly Tyr His Val Thr Asn Phe Phe Ala Pro Ser Ser 
1205 1210 1215 



CGA 



GGA ACT CCT GAT GAT TTG AAG TCT TTA ATA GAT AAA GCT CAT 



Arg Phe Gly Thr Pro Asp Asp Leu Lys Ser Leu He Asp Lys Ala His 
1220 1225 1230 

GAG TTA GGG CTG CTT GTT CTC ATG GAT ATT GTT CAT AGC CAT GCG TCA 

Glu Leu Gly Leu Leu Val Leu Met Asp He Val His Ser His Ala Ser 
1235 1240 1245 1250 

AAT AAT ACG TTG GAT GGG CTG AAC ATG TTT GAT GGT ACG GAT AGT CAC 

Asn Asn Thr Leu Asp Gly Leu Asn Met Phe Asp Gly Thr Asp Ser His 
1255 1260 ~ 1265 



1273 



1321 



1369 



1417 
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TAC TTC CAC TCC GGA TCA CGG GGT CAT CAT TGG TTG TGG GAC TCT CGC 
Tyr Phe His Ser Gly Ser Arg Gly His His Trp Leu Trp Asp Ser Arg 
1270 1275 1280 

CTT TTC AAC TAT GGA AGC TGG GAG GTG CTA AGA TTT CTT CTT TCA AAT 
Leu Phe Asn Tyr Gly Ser Trp Glu Val Leu Arg Phe Leu Leu Ser Asn 
1285 1290 1295 

GCA AGA TGG TGG TTG GAA GAG TAC AGG TTT GAT GGT TTT AGA TTT GAT 
Ala ^rg Trp Trp Leu Glu Glu Tyr Arg Phe Asp Gly Phe Arg Phe Asp 
1300 1305 1310 

GGG GTG ACT TCC ATG ATG TAC ACT CCC CAT GGG TTG CAG GTA GCT TTT 
Gly Val Thr Ser Met Met Tyr Thr Pro His Gly Leu Gin Val Ala Phe 
1315 1320 1325 1330 

ACT GGC AAC TAC AAT GAG TAC TTT GGA TAT GCA ACT GAT GTA GAT GCT 
Thr Gly Asn Tyr Asn Glu Tyr Phe Gly Tyr Ala Thr Asp Val Asp Ala 
1335 1340 1345 

GTG ATT TAT TTG ATG CTT GTG AAT GAT ATG ATT CAC GGT CTT TTC CCT 
Val He Tyr Leu Met leu Val Asn Asp Met He His Gly Leu Phe Pro 
1350 1355 1360 

GAG GCT GTT ACC ATT GGT GAA GAT GTT AGC GGA AAG CCA ACA TTT TGC 
Glu Ala Val Thr lie Gly Glu Asp Val Ser Gly Lys Pro Thr Phe Cys 
1365 1370 1375 

ATT CCA GTG GAA GAT GGT GGT GTT GGA TTT GAT TAC CGT CTC CAC ATG 
He Pro Val Glu Asp Gly Gly Val Gly Phe Asp Tyr Arg Leu His Met 
1380 1385 1390 

GCC ATT GCC GAT AAA TGG ATT GAG ATT CTT AAG AAG AGA GAT GAG GAC 
Ala He Ala Asp Lys Trp He Glu lie Leu Lys Lys Arg Asp Glu Asp 
1395 1400 1405 1410 

TGG AAA ATG GGT GAC ATT GTG CAT ACA CTC ACC AAC AGA AGG TGG TTG 
Trp Lys Met Gly Asp He Val His Thr Leu Thr Asn Arg Arg Trp Leu 
1415 1420 1425 

GAA AAA TGT GTT GCT TAT GCT GAA AGT CAT GAC CAA GCT CTT GTT GG T 
Glu Lys Cys Val Ala Tyr Ala Glu Ser His Asp Gin Ala Leu Val Gly 
1430 1435 1440 

GAC AAA ACT ATT GCA TTT TGG CTG ATG GAC AAG GAC ATG TAC GAC TTC 
Asp Lys Thr He Ala Phe Trp Leu Met Asp Lys Asp Met Tyr Asp Phe 
1445 1450 1455 

ATG GCT CGT GAC AGA CCA TCT ACT CCT CTT ATA GAT CGT GGA ATA GCA 
Met Ala Arg Asp Arg Pro Ser Thr Pro Leu He Asp Arg Gly He Ala 
1460 1465 1470 

TTG CAC AAA ATG ATC AGG CTT ATT ACC ATG GGC TTA GGC GGA GAA GGA 
Leu His Lys Met He Arg Leu He Thr Met Gly Leu Gly Gly Glu Gly 
1475 1480 1485 1490 
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TAT TTG AAT TTT ATG GGA AAT GAA TTT GGA CAT CCT GAG TGG An GAT 21 

Tyr Leu Asn Phe Met Gly Asn Glu Phe Gly His Pro Glu Trp He Asp 
1495 1500 1505 

TTT CCA AGA GGG GAT CGA CAT CTG CCC AAT GGT AAA GTA ATT CCA GGG 21 
Phe Pro Arg Gly Asp Arg His Leu Pro Asn Gly Lys Val He Pro Gly 
1510 1515 1520 

AAC AAC CAC AGT TAT GAT AAA TGC CGT CGT AGA TTT GAT CTA GGT GAT 22 
Asn t Asn His Ser Tyr Asp Lys Cys Arg Arg Arg Phe Asp Leu Gly Asp 
1525 1530 1535 

GCA GAC TAT CTA AGA TAT CAT GGA ATG CAA GAG TTT GAT CAG GCA ATG 22 
Ala Asp Tyr Leu Arg Tyr His Gly Met Gin Glu Phe Asp Gin Ala Met 
1540 1545 1550 

CAA CAT CTT GAA GAA GCC TAT GGT TTC ATG ACT TCT GAG CAC CAG TAT 23 
Gin His Leu Glu Glu Ala Tyr Gly Phe Met Thr Ser Glu His Gin Tyr 
1555 1560 1565 1570 

ATA TCA CGG AAG GAT GAA GGA GAT CGG ATC ATT GTC TTT GAG AGG GGA 23 
He Ser Arg Lys Asp Glu Gly Asp Arg He He Val Phe Glu Arg Gly 
1575 1580 1585 

AAC CTT GTT TTT GTA TTC AAC TTT CAT TGG ACT AAC AGC TAT TCA GAT 24 
Asn Leu Val Phe Val Phe Asn Phe His Trp Thr Asn Ser Tyr Ser Asp 
1590 1595 1600 

TAC CGA GTT GGC TGC TTC AAG TCA GGA AAG TAC AAG ATT GTT TTG GAC 24 
Tyr Arg Val Gly Cys Phe Lys Ser Gly Lys Tyr Lys He Val Leu Asp 
1605 1610 1615 

TCG GAT GAT GGC TTG TTT GGA GGC TTC AAC AGG CTT AGT CAT GAT GCC 25 
Ser Asp Asp Gly Leu Phe Gly Gly Phe Asn Arg Leu Ser His Asp Ala 
1620 1625 1630 

GAG CAC TTC ACC TTT GAC GGG TGG TAT GAT AAC CGG CCT CGG TCC TTC 25 
Glu His Phe Thr Phe Asp Gly Trp Tyr Asp Asn Arg Pro Arg Ser Phe 
1635 1640 1645 1650 

ATG GTA TAT GCA CCA TCT AGG ACA GCA GTG GTC TAT GCT TTA GTA GAA 26 
Met Val Tyr Ala Pro Ser Arg Thr Ala Val Val Tyr Ala Leu Val Glu 
1655 1660 1665 

GAT GAA GAG AAT GAA GCA GAG AAT GAA GTA GAA AGT GAA GTG AAA CCA 26 
Asp Glu Glu Asn Glu Ala Glu Asn Glu Val Glu Ser Glu Val Lys Pro 
1670 1675 1680 

GCC TCC GGC TGA GATAGATATT TAGTAAGAGG ATCCCCTAAA GCAGGAATGG 27 
Ala Ser Gly * 
1685 

TTAACCTGTG CATCTGCATT GAACGACGTA TATTGAGACT GGAAATCCAT ATGACTAGTA 27 
GATCCTCTAG AGTCGACCTG CAGGCATG 
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(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 849 ammo acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii ) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 

Met Val Tyr Tyr Thr Val Ser Gly He Arg Phe Pro Cys Ala Pro Ser 
15 10 15 

Leu Tyr Lys Ser Gin Leu Thr Ser Phe His Gly Gly Arg Arg Thr Ser 
20 25 30 

Ser Gly Leu Ser Phe Leu Leu Lys Lys Glu Leu Phe Pro Arg Lys lie 
35 40 45 

Phe Ala Gly Lys Ser Ser Tyr Glu Ser Asp Ser Ser Asn Leu Thr Val 
50 55 60 

Ser Ala Ser Glu Lys Val Leu Val Pro Asp Asp Gin He Asp Gly Ser 
65 70 75 . 80 

Ser Ser Ser Thr Tyr Gin Leu Glu Thr Thr Gly Thr Val Leu Glu Glu 
85 90 95 

Ser Gin Val Leu Gly Asp Ala Glu Ser Leu Val Met Glu Asp Asp Lys 
100 105 110 

Asn Val Glu Glu Asp Glu Val Lys Lys Glu Ser Val Pro Leu His Glu 
115 120 125 

Thr He Ser He Gly Lys Ser Glu Ser Lys Pro Arg Ser He Pro Pro 
130 135 140 

Pro Gly Ser Gly Gin Arg He Tyr Asp lie Asp Pro Ser Leu Ala G!y 
145 150 155 160 

Phe Arg Gin His Leu Asp Tyr Arg Tyr Ser Gin Tyr Lys Arg Leu Arg 
165 170 175 

Glu Glu He Asp Lys Tyr Glu Gly Gly Leu Asp Ala Phe Ser Arg Gly 
180 185 190 

Phe Glu Lys Phe Gly Phe Leu Arg Ser Glu Thr Gly lie Thr Tyr Arg 
195 200 205 

Glu Trp Ala Pro Gly Ala Thr Trp Ala Ala Leu He Gly Asp Phe Asn 
210 215 220 

Asn Trp Asn Pro Asn Ala Asp Val Met Thr Arg Asn Glu Phe Gly Val 
225 230 235 240 
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Trp Glu lie Phe Leu Pro Asn Asn Ala Asp Gly Ser Pro Pro He Pro 
245 250 255 

His Gly Ser Arg Val Lys He Arg Met Asp Thr Pro Ser Gly He Lys 
250 265 270 

Asp Ser He Pro Ala Trp He Lys Phe Ser Val Gin Ala Pro Gly Glu 
275 280 285 

lie. Pro Tyr Asn Ala He Tyr Tyr Asp Pro Pro Lys Glu Glu Lys Tyr 
'290 295 300 

Val Phe Lys His Pro Gin Pro Lys Arg Pro Lys Ser Leu Arg lie Tyr 
305 310 315 320 

Glu Ser His Val Gly Met Ser Ser Met Glu Pro lie He Asn Thr Tyr 
325 330 335 

Ala Asn Phe Arg Asp Asp Met Leu Pro Arg He Lvs Lys Leu Gly Tyr 
340 345 ' 350 

Asn Ala Val Gin lie Met Ala He Gin Glu His Ser Tyr Tyr Ala Ser 
355 350 365 

Phe Gly Tyr His Val Thr Asn Phe Phe Ala Pro Ser Ser Arg Phe Gly 
370 375 380 

Thr Pro Asp Asp Leu Lys Ser Leu He Asp Lys Ala His Glu Leu Gly 
385 390 395 400 

Leu Leu Val Leu Met Asp He Val His Ser His Ala Ser Asn Asn Thr 
405 410 415 

Leu Asp Gly Leu Asn Met Phe Asp Gly Thr Asp Ser His Tyr Phe His 
420 425 430 

Ser Gly Ser Arg Gly His His Trp Leu Trp Asp Ser Arg Leu Phe Asn 
435 440 445 

Tyr Gly Ser Trp Glu Val Leu Arg Phe Leu Leu Ser Asn Ala Arg Trp 
450 455 460 

Trp Leu Glu Glu Tyr Arg Phe Asp Gly Phe Arg Phe Asp Gly Val Thr 
465 470 475 480 

Ser Met Met Tyr Thr Pro His Gly Leu Gin Val Ala Phe Thr Gly Asn 
485 490 495 

Tyr Asn Glu Tyr Phe Gly Tyr Ala Thr Asp Val Asp Ala Val lie Tyr 
500 505 510 

Leu Met Leu Val Asn Asp Met lie His Gly Leu Phe Pro Glu Ala Val 
515 520 525 

Thr He Gly Glu Asp Val Ser Gly Lys Pro Thr Phe Cys He Pro Val 
530 535 540 
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Glu Asp Gly Gly Val Gly Phe Asp Tyr Arg Leu His Met Ala He Ala 

545 550 555 560 

Asp Lys Trp He Glu lie Leu Lys Lys Arg Asp Glu Asp Trp Lys Met 
565 570 575 

Gly Asp He Val His Thr Leu Thr Asn Arg Arg Trp Leu Glu Lys Cys 
580 585 590 

Val Ala Tyr Ala Glu Ser His Asp Gin Ala Leu Val Gly Asp Lys Thr 
' ' 595 600 605 

lie Ala Phe Trp Leu Met Asp Lys Asp Met Tyr Asp Phe Met Ala Arg 
610 615 620 

Asp Arg Pro Ser Thr Pro Leu lie Asp Arg Gly He Ala Leu His Lys 
625 630 635 640 

Met He Arg Leu He Thr Met Gly Leu Gly Gly Glu Gly Tyr Leu Asn 
645 650 655 

Phe Met Gly Asn Glu Phe Gly His Pro Glu Trp He Asp Phe Pro Arg 
660 665 670 

Gly Asp Arg His Leu Pro Asn Gly Lys Val He Pro Gly Asn Asn His 
675 680 685 

Ser Tyr Asp Lys Cys Arg Arg Arg Phe Asp Leu Gly Asp Ala Asp Tyr 
690 695 700 

Leu Arg Tyr His Gly Met Gin Glu Phe Asp Gin Ala Met Gin His Leu 
705 710 715 720 

Glu Glu Ala Tyr Gly Phe Met Thr Ser Glu His Gin Tyr He Ser Arg 
725 730 735 

Lys Asp Glu Gly Asp Arg lie He Val Phe Glu Arg Gly Asn Leu Val 
740 745 750 

Phe Val Phe Asn Phe His Trp Thr Asn Ser Tyr Ser Asp Tyr Arg Val 
755 760 765 

Gly Cys Phe Lys Ser Gly Lys Tyr Lys He Val Leu Asp Ser Asp Asp 
770 775 780 

Gly Leu Phe Gly Gly Phe Asn Arg Leu Ser His Asp Ala Glu His Phe 
785 790 795 800 

Thr Phe Asp Gly Trp Tyr Asp Asn Arg Pro Arg Ser Phe Met Val Tyr 
805 810 815 

Ala Pro Ser Arg Thr Ala Val Val Tyr Ala Leu Val Glu Asp Glu Glu 
820 825 830 

Asn Glu Ala Glu Asn Glu Val Glu Ser Glu Val Lys Pro Ala Ser Gly 

835 840 845 * 
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Claims 

1. A nucleic acid sequence encoding a polypeptide having starch branching enzyme 
(SBE) activity, the encoded polypeptide comprising at least an effective portion of the 
amino acid sequence shown in Figure 4 or Figure 13. 

2. A nucleic acid sequence according to claim 1, comprising nucleotides 21-2531 of the 
nucleic acid sequence shown in Figure 4, or a functionally equivalent nucleotide sequence 
which hybridises under stringent hybridisation conditions with the nucleic acid sequence 
shown in Figure 4. 

3. A nucleic acid sequence according to claim 1, comprising nucleotides 131-2677 of the 
nucleic acid sequence shown in Figure 13, or a functionally equivalent sequence which 
hybridises under stringent hybridisation conditions with the nucleic acid sequence shown 
in Figure 13. 

4. A nucleic acid sequence according to any one of claims 1, 2 or 3 comprising a 5' 
and/or a 3' untranslated region. 

5. A nucleic acid sequence according to any one of the preceding claims, encoding a 
polypeptide having the amino acid sequence NSKH at about residue 697. 

6. A nucleic acid sequence comprising at least 200bp and exhibiting at least 88% 
sequence identity with the corresponding region of the DNA sequence shown in Figures 
4, 9, 10 or 13, operably linked in the sense or anti-sense orientation to a promoter 
operable in plants. 

7. A nucleic acid sequence according to claim 6, comprising at least 30O-60Obp. 

8. A sequence according to claim 6 or 7, comprising a 5'and/or 3' untranslated region. 
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9. A sequence according to claim 8, comprising nucleotides 688-1044 of the sequence 
shown in Figure 9, and/or nucleotides 1507-1900 of the sequence shown in Figure 10. 

10. A sequence according to claim 6, comprising the nucleotide sequence shown in Figure 
10. 



11. A replicable nucleic acid construct comprising a nucleic acid sequence according to 
any one of the preceding claims. 

12. A polypeptide having SBE activity and comprising an effective portion of the amino 
acid sequence shown in Figure 4 or Figure 13. 

13. A polypeptide according to claim 12, in substantial isolation from other polypeptides. 

14. A polypeptide according to claim 12 or 13, having the amino acid sequence NSKH 
at about position 697 . 

15. A method of modifying starch in vitro, the method comprising treating starch to be 
modified under suitable conditions with an effective amount of a polypeptide according to 
any one of claims 12, 13 or 14. 



16. A method of altering a plant host cell, the method comprising introducing into the cell 
a nucleic acid sequence comprising at least 200bp and exhibiting at least 88% sequence 
identity with the corresponding region of the DNA sequence shown in Figures 4, 9, 10 
or 13, operably linked in the sense or anti-sense orientation to a suitable promoter active 
in the host cell, and causing transcription of the introduced nucleotide sequence, said 
transcript and/or the translation product thereof being sufficient to interfere with the 
expression of a homologous gene naturally present in the host cell, which homologous 
gene encodes a polypeptide having SBE activity. 

17. A method according to claim 16, wherein the host cell is from a cassava, banana, 
potato, pea, tomato, maize, wheat, barley, oat, sweet potato or rice plant. 
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18. A method according to claim 16 or 17, comprising the introduction of one or more 
further nucleic acid sequences, operabiy linked in the sense or anti-sense orientation to a 
suitable promoter active in the host cell, and causing transcription of the one or more 
further nucleic acid sequences, said transcripts and/or translation products thereof being 
sufficient to interfere with the expression of homologous gene(s) present in the host cell. 

19. A method according to claim 18, wherein the one or more further nucleic acid 
sequences interfere with the expression of a gene involved in starch biosynthesis. 

20. A method according to claim 18 or 19, wherein the further nucleic acid sequence 
comprises at least part of an SBE I gene. 

21. A method according to claim 20, wherein the further nucleic acid sequence comprises 
at least part of the cassava SBE I gene. 

22. A method according to any one of claims 16-21, wherein the host cell is selected 
from one of the following: cassava, banana, potato, pea, tomato, maize, wheat, barley, 
oat, sweet potato or rice. 

23. A method according to any one of claims 16-22, wherein the altered host cell gives 
rise to starch having different properties compared to starch from an unaltered cell. 

24. A method according to any one of claims 16-23, further comprising the step of 
growing the altered host cell into a plant or plantlet. 

25. A method of obtaining starch having altered properties, comprising growing a plant 
from an altered host cell according to the method of claim 24, and extracting the starch 
therefrom. 

26. A plant or plant cell into which has been artificially introduced a nucleic acid 
sequence comprising at least 200bp and exhibiting at least 88% sequence identity with the 
corresponding region of the DNA sequence shown in Figures 4, 9, 10 or 13, operabiy 
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linked in the sense or anti-sense orientation to a promoter operable in plants, or the 
progeny thereof. 

27. A plant according to claim 24, altered by the method of any one of claims 16-22. 

28. Starch obtainable from an altered plant according to claim 26 or 27, having altered 
properties compared to starch extracted from an equivalent but unaltered plant. 

29. Starch obtained from an altered plant according to claim 26 or 27, having altered 
properties compared to starch extracted from an equivalent but unaltered plant. 

30. Starch according to claim 28 or 29 obtained from an altered plant selected from the 
group consisting of:- cassava, banana, potato, pea, tomato, maize, wheat, barley, oat, 
sweet potato and rice plants. 

31 . Starch according to any one of claims 28, 29 or 30, having increased amylose content 
compared to starch extracted from an equivalent but unaltered plant. 
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Fig.2. 
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1ATGGATTGACATCGATAATACGA.CTCACTATAGGGATTTCTTTTTTTTTCTTTTTGNTTT TTAAAAAAAGTTGAACATGCAATTAGTTGCGTCAGTTCTCACACTCTCTCTAACrTCTC 
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ATACCTAAC TGTAGCTATTATGCTGAGTGAT A TC CC TAA AG A A AAA AAA AG A AAA ACNAAAAATTTTTTT CAAC TTGT AC GTTAATC A ACGCAGTC A AGAGTGTG AG AG AG A TTGAAG AG 

KOLVASVLTLSLTS 

NCO t 

AGCGAAATGGGACACTAC ACCATATCAGGAATACGTTTTCCTTGTGCTCCACTCCGCAA ATCTC AATCTACCGGCTTCCATGGTGATCGAAGGACCTCCTCTTGCCTTTCCTTCAACTTC 

11,1 1 ' ' 1 ! ' 1 ' ' ' ' ' ' ! ' ' ' ' " + "' 1 1 1 1 I 1 ■ ■ " 11 — i ■ | . ,. — ,- +, i ■ ■ | niiO 

TCGCTTTACCCTGTGATGTGGTATAGTCCTTATGCAAAAGGAACACGAGGTGAGGCGTTTAGAGTTAGATGGCCGAAGGTACCACTAGCTTCCTGGAGGAGAACGGAAAGGAAGTTGAAG 

QRNGTLHH 1 RNTFSLCSTPOIS I YRLPW 

MGHVT [ SG I RFPCAPL RKSQSTGFHGDRRTSSCLSFNF 

AAGAAGGCGGCGTTTTCTAGGAGGGTCTTCTCTGGAAAGTCATCTCATGAATCTGACTC CTCAAATGTAATGGTCACTGCGTCTAAAAGAGTCCTTCCTGATGGTCGGATTGAATGCTAT 

TTCTTCCGCCGCAAAAGATCCTCCCAGAAGAGACCTTTCAGTAGAGTACTTAGACTGAGGAGTTTAC ATTACCAGTGACGCAGATTTTCTCAGGAAGGACT ACCAGCCTAACTTACGATA ^ 
KKAAFSRRVF5GKSSHES0SSNVMVTASKRVLPDGR1ECY 

TCTTCTTCAACAGATCAATTGGAAGCCCCTGGC ACAGTTTCAGAAGAATCCCAGGTGCTTACTGATGTTGAGAGTCTCATTATGGATGATAAGATTGTTGAAGATGAAGTAAATAAAGAA 

r i . 1 1 1 ■ 1 ■ 1 ■ — ■ — I 1 < 1 ' * ■ I ■ 1 1 (— 1 f. q80 

AGAAGAAGTTGTCTAGTTAAGCTTCGGGGACCGTGTCAAAGTCTTCTTAGGGTCCACGAATGACTACAACTCTCAGAGTAATACCTACTATTCTAACAACTTCTACTTCATTTATTTCTT 
SSSTDQLEAPGTVSEESQVLTDVESLIHODKIVEOEVNKE 

Xmn i Hmd , it 

TCTGT TCCAATGCGGGAGACAGTTAGCATCGGAAAAATTGGATCTAAACCAAGGTCCATTCCTCCACCCGGCAGAGGGCAAAGAATATATGACATAGATCCAAGCTTGACAGGCTTTCGT 

■ * ■ ■ ■ ) . . 1 -< 1 . 1 1 1 ' 1 ' 1 1 H < -4- < ■ 1 ' i 1 h 600 

AGACAAGGTTACGCCCTCTGTCAATCGTAGCCTTTTTAACCTAGATTTGGTTCCAGGTAAGGAGGTGGGCCGTCTCCCGTTTCTTATATACTGTATCTAGGTTCGAACTGTCCGAAAGCA 
SVPMRETVSJ GK IGSXPRSIPPPGRGQRIYDIDPSLTGFR 

Hinc M Nst I 

CAAC ACCTAGATTACCGGT ATTC AC AGTAC AAAAGACTCCGAGAAGAAATTGAC AAGTATGAAGGTAGTCTGGATGC ATTTTCTCGTGGCTATGAAAAGTTTGGTTTCTCACGCAGTGAA 

■ — ' i i 1 i 1 i 1 i 1 ' 1 1 1 1 1 . ( — , * I 1 . h 720 

GTTGTGGATCTAATGGCCATAAGTGTCATGTTTTCTGAGGCTCTTCTTTAACTGTTCATACTTCCATCAGACCT ACGTAAAAGAGCACCGATACTTTTC AAACCAAAGAGTGCGTCACTT 
□ HLDYRYSQYXRLREEIDKYEGSLDAFSRCYEKFGFSRSE 



Bgt II 

ACAGGAATAACTTATAGAGAGTGGGCACCAGGAGCTACGTGGGCTGC ATTGATTGGAGATTTCAATAACTGGAATCCTAATGCAGATGTCATGACTCAGAATGAGTGTGGTGTCTGGGAG 

- — — 1 . 1 1 1 f 1 , 1 h— ■ 1 , p , 1 , ) » — i &H0 

TGTCCTTATTGAATATCTCTCACCCGTGGTCCTCGATGCACCCGACGTAACTAACCTCTAAAGTTATTGACCTTAGGATTACGTCTACAGTACTGAGTCTTACTCACACCACAGACCCTC 

TG I TYR£WAPGATWAAL 1 GDFNNWNPNADVMTQNECGVWE 

NCO 1 xno I 

ATCTTTTTGCCGAATAATGCAGATGGTTCACCACCAATTCCCCATGGTTCTCGAGTAAAGATACGCATGGATACTCCATCTGGCAACAAAGATTCTATTCCTGCTTGGATCAAGTTCTCA 

— -t ■ — • 1 ' ' 1 ' ■ 1 « 1 • 1 — — h 1 1 . 1 I 960 

TAGAAAAACGGCTTATTACGTCTACCAAGTGGTGGTTAAGGGGTACCAAGAGCTCATTTCTATGCGTACCTATGAGGTAGACCGTTGTTTCTAAGATAAGGACCAACCTAGTTCAAGAGT 
IFLPNNAOGSPP IPHGSRVX IRMDTPSGNKOE ! PAWIKFS 

GTTCAAGCACCAGGTGAACTCCCATAT AATGGCATATACTATGATCCTCCCGAGGAGGAGAAGTATGTGTTCAAAAATCCTCAGCCAAAGAGACCAAAATCACTTCGGATTTATGAGTCG 

■ 1 ' ' 1 ' 1 — ' 1 1 1 ' ' « 1 ■ ■ 1 1 1 < ! 1 k 1080 

CAAGTTCGTGGTCCACTTGAGGGTATATTACCGTATATGATACTAGGAGGGCTCCTCCTCTTCATACACAAGTTTTTAGGAGTCGGTTTCTCTGGTTTTAGTGAAGCCTAAATACTCAGC 

VOAPGELPYNG ! YYDPFEEEKYVFKNPOPKRPKSLR I YES 

Nde i Hind IJ! 

CACGTTGGAATGAGTAGTACGGAGCCACTAATTAACACATATGCGAACTTTAGAGATGATGTGCTTCCTCGCATCAAAAAGCTTGGCTACAATGCTGTTCAGCTCATGGCTATTCAAGAG 

1 1 ' 1 ' 1 1 1 ' 1 — 1 1 1 1 *- H- t 1 . — ( , 1- J200 

GTGCAACCTTACTCATCATGCCTCGGTCATTAATTGTGTATACGGTTGAAATCTCTACTACACGAAGGAGCGTAGTTTTTGGAACCGATGTTACGACAAGTCGAGTACCGATAAGTTCTC 
HVGMSS TEPV 1 NTYANFRODVLPR ] KKLGYNAVQLHA I Q E 

CATTCATATTATGCT AGTTTTGGGTATC ACGTCAC AAACTTTTATGC AGCTAGCAGCCGATTTGGAACTCCTGATGATTTAAAGTCCCTAGTAGATAAAGCTCACGAGTTAGGTCTTCTT 

1 1 — ' ' 1 ' 1 " 1 ' 1 1 1 1 1 ' ' 1 1 ► 4 ' 1 ' h 1320 

GTAAGTATAAT ACGATCAAAACCCATAGTGC AGTGTTTGAAAATACGTCGATCGTCGGCTAAACCTTGAGGACTACTAAATTTCAGGGATCATCTATTTCGAGTGCTCAATCCAGAAGAA 

HSYYASFGYHVTI^FYAASSRFGTPDDLKSLVOKAHELGLL 
NSI t 

GTTCTCATGGATATTGTTC ATAGCCATGCATCAAC TAATAC GT TGGATGGGCTGAATATGTTTGATGGTACGGATGGTCAC TACTTTC ACTCTGGACCACGGGGTCATCATTGGATGTGG 

1 ' 1 1 1 1 ' 1 ■ 1 ' ^ 1 ' 1 ■ 1 i ■ — i 1 I 1 1 h ]tWQ 

CAAGAGTACCTATAACAAGTATCGGTACGTAGTTGATTATGCAACCTACCCGACTTATACAAACTACCATGCCTACCAGTGATGAAAGTGAGACCTGGTGCCCCAGTAGTAACCTACACC 

VLMD1VHSHASTNTL0GLNMFDGTDGHYFHSGPRGHHWMW 

GACTCTCGCCTTTTCAACTATGGGAGCTGGG AGGTTC TAAGGTT' TC TTTC AAATACAAGGTGGTGGTTGGATGAGTAC AAGTTTGATGGGTTC AGATTTGATGGGGTGACTTCAATG 

' 1 — ' — 1 1 1 1 ' ' 1 ' ' 1 ' 1 1 * 1 1 1 1 ■ < ■ ■ ■ ■ — +- *■ 1560 

CTGAGAGC6GAAAAGTTGATACCCTCGACCC TCCAAGATTCC AAAGAAGAAAGTTTATGT7CCACCACCAACCTACTCATGTTCAAACTACCCAAGTCTAAACTACCCCACTGAAGTTAC 

DSRLFNYGSWEVLRFLL5NTRWWLDEYKF0GFRFDGVTS« 
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Fig.2 (Cont). 



ATGTACACCCATCATGGATTGCAGGTAGATTTCACCGGCAACTACAATGAATACTTTGGATATGCAACTGATGTACATGCTGTGGTrTATCTGATGCTGTTGAATGATATGATTCATGGT 

1 1 1 ' — ' 1 ' 1 ~ H 1 1 1,1 11 1 1 ' ' 1 ■ — i ■ t ' 1 I 1680 

TACATGTGGGTAGTACCTAACGTCCATCTAAAGTGGCCGTTGATGTTACTTATGAAACCTATACGTTGACTAC ATCTACGAC ACCAAATAGACTACGAC AACTTACTATACTAAGTACCA 

HYTHHGLQVDFTGNYNEYFGYATOVDAVVVLMLLNOM I HG 

CTCTTCCCAGAGGCTGTCACCATTGGTGAAGATGTTAGTGGAATGCCAACAGT TTGCA TTCCGGTTGAAGATGGTGGTGTTGGCTTTGATTATCGTCTCC ACATGGCTGTTGCTGATAAA 

, 1 1 1 . 1 . 1 , 1 , 1 , 1 1 < ' 1 « ■ ■ t 1 k 1800 

GAGAAGGGTCTCCGACAGTGGTAACCACTTCTACAATCACCTTACGGTTGTCAAACGTAAGGCCAACTTCTACCACCACAACCGAAACTAATAGCAGAGGTGTACCGACAACGACTATTT 

LFPEAVT t GEDVSGMPTVC 1PVEDGGVGFDYRLHMAVADK 

Nde I 

TGGGTTGAGATTATTCAGAAGAGAGATGAAGATTGGAAAATGGGTGACATTGTACATATGCTGACCAACAGGCGGTGGTTGGAAAAGTGTGTTTCTTATGCTGAAAGTCATGACCAGGCC 
ACCCAACTCTAATAAGTCTTCTC7CTACTTCTAACCTTTTACCCACTGTAACATGTATACGACTGGTTGTCCGCCACCAACCTTTTCACAC AAAGAATACGACTTTCAGTACTGGTCCGG 
W V E ! 1 OKROEDWKKGD I VHHLTNRRWLEKCVSYAE5HDQA 

CTTGTTGGTGACAAAAC TATTGCATTTTGGC TGATGGAC AAGGATATGTATGAC TTC ATGGCTCGTGACAGACCATC T ACTCCTCTTATAGATCGTGGAATAGCATTGCAC AAAATGATC 

_4 1 1 1 ' 1 1 >- ' ' 1 > ■ * -h ' I ■ ■ . I 20H0 

GAACAACCACTGTTTTGATAACGTAAAACCGACTACCTGTTCCTATACATACTGAAGTACCGAGCACTGTCTGGTAGATGAGGAGAATATCTAGCACCTTATCGTAACGTGTTTTACTAG 
LVGOK T I AFWLMOKDMYDFMARDRPSTPL I DRG I ALHKH I 

Nco I 

AGGCTTATTACCATGGGCTTAGGCGGAGAAGGATATTTGAATTTTATGCGAAATGAATTTGGACATCCTGAGTGGATTGATTTTCCAAGAGGGGATCGACATC TCCCC AATGGTAAAGTA 

' ' 1 ' ' » ' 1 ' ' 111 ' 1 1 I ■ i . 1 1 1 1 1- 2160 

TCCGAATAATGGTACCCGAATCCGCCTCTTCCTATAAACTTAAAATACCCTTTACTTAAACCTGTAGGACTCACCTAACTAAAAGGTTCTCCCCTAGCTGTAGACGGGTTACCATTTCAT 

R L 1TMGLGGECYLNFMGNEFGHPEWI DFPRGDRHLPNGKV 

EcoR V 

ATTCCAGGCAACAACCACAGTTATGATAAATGCCGTCGTAGATTTGATCTAGGTGATGCAGACTATCTAAGATATCATGGAATGCAAGAGTTTGATCAGGCAATGCAACATCTTGAAGAA 

1 i i ii 1 , 1 i ' 1 ' ' i 1 ' * 1 H 1 1 > 1- 2280 

TAAGGTCCCTTGTTGGTGTC AATACTATTTACGGCAGCATCTAAACTAGATCCACTACGTCTGATAGATTCTATAGTACCTTACGTTCTCAAACTAGTCCGTTACGTTGTAGAACTTCTT 

IPGNNHSYOKCRRRFDLGOADYLRYHGKQEFDQAttQHLEE 

GCCTATGGTTTCATGACTTCTGAGC ACCAGTATATATCACGGAAGGATGAAGGAGATCGGATCATTGTCTTTGAGAGGGGAAACCTTGTTTTTGTATTCAACTTTCATTGGACTAAC AGC 

1 , i , 1 1 1 ■ 1 . ( ~ 1 1 1 1 1 1 1 — i 1 1 1 i 2400 

CGGATACCAAAGTACTGAAGACTCGTGGTCATATATAGTGCCTTCCTACTTCCTCTAGCCTAGTAACAGAAACTCTCCCCTTTGGAACAAAAACATAAGTTGAAAGTAACCTGATTGTCG 
AYGFMTSEHQY I SRKDEGOR I I VFERGNLVFVFNFHWTNS 

TATTCAGATTACCGAGTTGGCTGCTTCAAGTCAGGAAAGTACAAGATTGTTTTGGACTCCGATGATGGCTTGTTTGGAGGCTTCAACAGGCTTAGTCATGATGCCGAGrACTTCACCTTT 

, 1 ■ i « 1 1 • 1 ' 1 ' 1 ■ 1 « 1 i 1 1 . v- 2520 

ATAAGTCTAATGGCTCAACCGACGAAGTTCAGTCCTTTCATGTTCTAACAAAACCTGAGCCTACTACCGAACAAACCTCCGAAGTTGTCCGAATCAGTACTACGGCTCGTGAAGTGGAAA 

YSDYRVGCFK SGKYK 1 VLDSDDGLFGGFNRLSHOAENFTF 

GACGGGTGGTATGATAACCGGCCTCGGTCCTTCATGGTATATGCACCATCTAGGACAGCAGTGGTCCATGCTTTAGTAGAAGATCAAGAGAATGAAGCAGAGAATGAAGTAGAAAGTGAA 

I 1 ' 1 ■ 1 . 1 ■ i ' • ■ 1 ■ 1 ■ i 1 1 1 — ^ 1* 2640 

CTGCCCACCATACTATTGGCCGGAGCCAGGAAGTACCATATACGTGGTAGATCCTGTCGTCACCAGGTACGAAATCATCTTCTACTTCTCTTACTTCGTCTCTTACTTCATCTTTCACTT 

DGWYONRPRSFMVYAPSRTAVYHALVEDEENEAENEVESE 

BamH I Hinc II 

GTGAAACCAGCCTCCGGCTGAGATAGATATTTAGTAAGAGGATCCCCTAAAGCAGGAATGGTTAACCTGTGCATCTGCATTGAACGACGTATATTGAGACTTGAATTGATTTGCTGCTCA 



CACTTTGGTCGGAGGCCGACTCTATCTATAAATCATTCTCCTAGGGGATTTCGTCCTTACC AATTGGACACGTAGACGTAACTTGCTGCATATAACTCTGAACTTAACTAAACGACGAGT 
V K P A S G 

Ssp I Nsi 1 Nde I 

GGACACAGAATATTAATTCCAAGGCTCAAGGCAGAGATACACGCCATAATGCATGATCATATGAAAGCTCCCCAACTTGTAAATCATTTAGCAAGCTGCGTGCACTCTGTAAATTATATG 
CCTGTGTCTTATAATTAAGGTTCCGAGTTCCGTCTCTATGTGCGGTATTACGTACTAGTATACTTTCGAGGGGTTGAACATTTAGTAAATCGTTCGACGCACGTGAGACATTTAATATAC 

Sea I NCO I 

TAGTACTTTGGCAAGTCACGTTATTATGGATACCATGGATGTCCGCTAGGAAAAATTTTGTGTATACGCCTACTAGGATTTTTAAATCTCGCATGTTCCACATAAAGTGGTGGTTGAATG 
ATCATGAAACCGTTCAGTGCAATAATACCTATGGTACCTACAGGCGATCCTTTTTAAAACACATATGCGGATGATCCTAAAAATTTAGAGCGTACAAGGTGTATTTCACCACCAACTTAC 

Xmnl 

T7GCGCGACTATTTTTGAGTAAAATGATTGAAGTTATTCTTCACTTGGGCCTGTGAAAAAAAAAAAAAAAAAAA 

» 1 — ■ < 1 i 1 « 1 ■ ( ■ 1 ■ 1 3074 

AACGCGCTGATAAAAACTCATTTTACTAACTTC AATAAGAAGTGAACCCGGACAC7TTTTTTTTTTTTTTTTTT 
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Fig.4. 



Nco I 

CTCTCTAACTTCTCAGCGAAATGGGACACTACACCATATCAGGAATACGTTTTCCTTGTGCTCCA CTCTGCAAATCTCAATCTACCGGCTTCCATGGCTATCGGAGGACCTCCTCTTGCC 
GAGAGATTGAAG^ Z ' CGC TTT AC CCTGTGATGTGGT ATAGTCC TTATGC AAAAGGAAC ACGAGGTGAGArG" , AGAGTTAGATGGCCGAAGGTAC CGATAGCCTCCTGGAGGAGAACGG 

MGHYT I SG1RFPCAPLCKSQSTGFHGYRRTSSC 

TTTCCTTCAACTTCAAGGAGGCGTTTTCTAGGAGGGTCTTCTCTGGAAAGTCATCTCATGAATCTGACTCCTCAAATGTAATGGTCACTGCTTCTAAAA GAGTCCTTGCTGATGGTCGGA 
AAAGGAAGTTGAAGTTCCTCCGCAAAAGATCCTCCCAGAAGAGACCTTTCAGTAGAGTACTTAGACTGAGGAGTTTAC ATTACCAGTGACGAAGATTTTCTCAGGAAGGACTACCAGCCT 

LSFNFKEAFSRRVFSGKSSHESDSSNVMVTASKRVLPDGR 

TTGAATGCTATTCTTCTTCAAC AGATCAATTGGAAGCCCCTGGCACAGTTTC AGAAGAATCCCAGGTGCTTACTGATGTTGAGAGTCTCATTATGGATGATAAGATTGTTGAAGATGAAG 
AACTTACGATAAGAAGAAGTTGTCTAGTTAACCTTCGGGGACCGTGTC AAAGTCTTCTTAGGGTCCACGAATGACTACAACTCTCAGAGTAATACCTACTATTCTAACAAC TTCTACTTC 

I£CY^SSSTOQLEAPGTVSF.ESOVLTDVESL!«DDK ] VEDE 

Xmn I Hind III 

TAAATAAAGAATCTGTTC C AATGC GGGAGACAGT TAGC A7C AGAAAAATTGGAT CT AAACC AAGGTCC ATTCCTCC ACCCGGCAGAG6GC A AAGAA T ATATGAC ATAGATCC AAGCTTGA 
ATTTATTTCTTAGACAAGGTTACGCCCTCTCTCAATCGTAGTCTTTTTAACCTAGATTTGGTTCCAGGTAAGGAGGTGGCCCGTCTCCCGTTTCTTATATACTGTATCTAGGTTCGAACT 

VNKESVPHRETVSIRKIGSKPRSIPPPGRGQRIYOIOPSL 

Hindi Nsil 

CAGGCTTTCGTCAACACCTAGATTACCGGTATTCACAGTACAAAAGACTCCGAGAAGAAATTGACAAGTATGAAGGTAGTCTGGATGCATTTTCTCGTGGCTATGAAAAGTTTGGTTTCT 

1 ( , 1 , 1 1 1 1 ■ 1 1 . 1 i 1 1 > 1 1 ► i 

GTCCGAAAGCAGTTGTGGATCTAATGGCCATAAGTGTCATGTTTTCTGAGGCTCTTCTTTAACTGTTCATACTTCCATC AGACCTACGTAAAAG AGC AC CGATACTTTTC AAACC AAAGA 

TGFRQHLDYRYSOYKRLREE ) DKYEGSLDAF SRGYEKFGF 

CACGCAGTGAAACAGGAATAAC TTAT AGAGAGTGGGC ACC AGGAGC T ACGTGGGCTCCATTGAT TGGAGATTTCAAT AAC T GGA ATCCTAATGC AGATGTC ATGAC TC AGAATGAGTGTG 
GTGCGTCACTTTGTCCTTATTGAATATCTCTC ACCCGTGGTCCTCGATGCACCCGACGTAACTAACCTCTAAAGTTATTGACCTTAGGATT ACGTC TAC AGTACTGAGTC TTAC TCAC AC 

SRSETG 1 TYREWAPGATWAAL 1 GOFNNWNPNAD VMTQNEC 
Bgi N Nco I Xho i 

GTGTCTGGGAGATCTT TTTGCCGAAT AATGC AGATGGTTC AC C ACC AA T TCCCC ATGGTTC TCGAGTAAAGAT ACGC AT GG ATAC TCC ATC TGGC AACAAAGATTC TATTCCTGCTTGGA 

' 1 1 -* ' ' — i ' • ~ 1 ■ 1 ■ — ■ ' ' 1 ' 1 > 1 ' h t 

CACAGACCCTCTAGAAAAACGGCTTATT ACGTC TAC CAAGTGGTGGTTAAGGGGTACCAAGAGCTC ATTIC TATGCGT ACC TATGAGGTAGACCGTTGTTTCTAAGATAAGGACGAACCT 

GVWE I FLPNNAOGSPP ' PHGSRVK I RMDTPSGNKDS I PAW 

TCAAGTTCTCAGTTCAAGCACC AGGTGAACTCCCATATAATGGCATAT ACTATGATCCTCCCGAGGAGGAGAAGTATGTGTTCAAAAATCC TCAGCC AAAGAGACCAAAATCACTTCGGA 
■ . 1 1 1 » 1 1 1 1 1 > *- ,. - ( 1 , ( , 1 1 1 1 (- i 

AGTTCAAGAGTCAAGTTCGTGGTCCACT TGAGGGTAT A TTACCGT ATATGAT AC T AGGAGGGCTCC TCCTC TTC AT AC AC A AGTT TTT AGGAGTCGGTT T CTCTGGT TTTAGTGAAGCCT 
IKFSVQAPGELPYNG i YYDPPEEEKYVFKNPQPKRPKSIR 

Hind llf 

TTTATGAGTCGCACGTTGGAATGAGTAGTACGGAGC CAGT AA TT AAC AC ATA TGCC AAC TT T AGAGATGAT GTGC TTCCTC GC ATCAAAAAGCT TGGCTAC AAT GCTGTTC AGC TCATGG 
AAATACTCAGCGTGCAACCTTACTCATC ATGCCTCGGTCATTAATTG'GT ATACGGTTGAAATC TCTACTACACGAAGGAGCGTAGTTTTTCGAACCGATGTTACGACAAGTCGAGTACC 

I YESHVGMSSTEPV 1 NT YANFRDDVL PR I KKLGYNAVQLM 

CTATTCAAGAGC ATTCATATTATGCTAGTTTTGGGTATCACGTC ACAAAC TT^TATGCAGCTAGCAGCCGATTTG GAAC TCCTGATGAT TTAAAG TC TC TA AT AGAT AAAGCTCACGAGT 
GATAAGTTCTCGTAAGTATAATACGATCAAAACCCATAGTGCAGTGTTTGAAAATACGTCGATCGTCGGCTAAACCTTGAGGACTACTAAATTTCAGAGATTATCTATTTCGAGTGCTCA 

AIQEHS VY AS' r G v HV-NrrAASSRFGTPODLKSL i OKAHE 

JMsi I 

TAGGTCTTCTTGTTCTCATGGATATTGTTCAT AGCC ATGCATC AAC T A AT AC GTTGGATGGGC TG AATATGTTTG ATGGT ACGGATGGTCACT ACTTTC AC TCTGGACCACGGGGTCATC 
ATCCAGAAGAACAAGAGTACCTATAACAAGTATCGGTACGTAGT'GAT-ATGCAACCTACCCGACTTATACAAACTACCATGCCTACCAGTGATGAAAGTGAGACCTGGTGCCCCAGTAG 

LGLLVLHD I YHSHAS T NT L QGLNMF0_GTOGHYFHSGPRGH 
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Fig.4 (Cont). 



1600 



ATTGGATCTGGGACTCTCGCCTTTTC AACTATG6GAGCTGGGAGGTTCTAAGGTTTCTTCTTTC AAATGCAAGGT3GTGGTTGGATGAGTACAAGTTTGATGGGTTCAGATTTGATGGGG 

i j _ i i , h >~ — ■ — — • 1 • 1 ■ 1 • 1 ' I 1 i 1 — >- ! » — t- 1340 

TAACCTACACCCTGAGAGCGGAAAAGTTGATACCCTCGACCCTCCAAGATTCCAAAGAAGAAAGTTTACGTTCCACC ACCAACCTACTCATGTTCAAACTACCCAAGTCTAAACTACCCC 

HWHWOSRLFNYGSWEVLRFLLSNARWWLOEYKFOGFRFDG 

TGACTTCAATGATGTACACCCATCATGGATTGCAGGTAGATTTTACCGGCAACTACAATGAATACTTTGGATATGCAACTGATGTAGATGCTGTGGTTTATTTGATGCTGTTGAATGATA 

I , 1 1 h » — ■ ( ■ 1 ' I » i 1 i it''i I ■ i 1660 

ACTGAAGTTACTACATGTGGGTAGTACCTAACGTCCATCTAAAATGGCCGTTGATGTTACTTATGAAACCTATACGTTGACTACATCTACGACACCAAATAAACTACGACAACTTACTAT 

VTSMMYTHHGLOVDFTGNYNEYFGYATDVDAVVYLHLLND 

TGATTCATGGTCTCTTCCCAGAGGCTGTCACCATTGGTGAAGATGTTAGTGGAATGCC AACAGTTTGCATTCCGGTTGAAGATGGTGGTGTTGGCTTTGATTATCGTCTCCACATGGCTG 

i i i i ; i | T ■ ■ ■ t - ■ • ■ 1 1 1 — ' ■ — ■■ ■• ' i i ■ < . i ■ ■ i -..t ......... t i ■ .I. j 1680 

ACTAAGT ACCAGAGAAGGGTCTCCGACAGTGGTAACC ACTTCTACAATCACCTTACGGTTGTCAAACGTAAGGCCAACTTCTACCACCACAACCGAAACTAATAGCAGAGGTGTACCGAC 

HI HGLFPEAVT I GEDVSGHPTVC ! PVEDGGVGFD'YRLHtIA 

TTGCTGATAAATGGGTTGAGATTATTC AGAAGAGAGATGAAGATT GGAAAATGGGTGACATTGTACATATGCTGACCAACAGGCGGTGGTTGGAAAAGTGTGTTTCTTATGCTGAAAGTC 
AACGACTATTTACCCAACTCTAAT AAGTCTTCTCTCTACTTCTAACCTTTTACCCACTGTAACATGTATACGACTGGTTGTCCGCCACCAACCTTTTCACACAAAGAATACGACTTTCAG 

VADXWVE i ! QKR OEDWKMGO I VHMLTNRRVLEKC VSYAE5 

ATG ACCAGGCCCTTGTTGGTGACAAAACTATTGCATTTTGGCTGATGGACAAGGATATGT^ 

TACTCGTCCGGGAACAACCACTGTTTTGATAACGTAAAACCGACTACCTGTTCCTATACATACTGAAGT ACCGAGAACTGTCTGGTAGATGAGGAGAGTATCTAGCACCTCATCGTAACG 
HDOALVGDKT ! AFWLKDKOMYDFKALDRPSTPLfORGVAL 
&A t Nco I 

ACAAAATGATCAGCCTTATTACCATGGGATTAGGCGGAGAAGGATATTTGAATTTTAT^ q 
TGTTTTACTAGTCCGAATAATGGTACCCTAATCCGCCTCTTCCTATAAACTTAAAATACCCTTTACTTAAACCTGTGGGGCTCACCTAACTAAAAGGTTCTCCACTAGATGTAGAAGGGT 

H K M ! RL I TMGLGGEGYLNFMGNEFGHPEWI DFPRGDLHLP 

EcoR V Bel I 

GTGGTAAATTTGTTCCTGGGAACAATTACAGTTATGATAAATGCCGGCGTAGGTTTGATCTAGGCAATTCAAAGCATCTGAGATATC ATGGAATGCAAGAGTTTGATCAAGCAATTCAGC 

, i i i | i j ■ » ■ i | t i ■ ■ j i -i — | | i ■ i ..... -.. - ^ i i i f .. t .■ — t 2 low 

CACCATTTAAACAAGGACCCTTGTTAATGTC AATACTATTTACGGCCGCATCCAAACTAGATCCGTTAAGTTTCGTAGACTCTATACTACCrTACGTTCTCAAACTAGTTCGTTAAGTCG 

SGKFVPGIsfNYSYDKCRRRFDLGNSKHLRYHGHQEFDOAIQ 

ATCTTGAAGAACCCTATGGTTTC ATGACTTCTGAGCACCAATACATATCALGGAAGGATGAAAGGGATCGGATC ATTGTCTTCGAGAGGGGAAACCTCGTTTTTGTAT TCAATTTTCATT 

, , | | i | j i i | — — . f._ — , i — i ■ ■ i i ■ . M ... . 4 - ■ . i t-..--.i.—t ■ ■ i i I 2280 

TAGAACTTCTTCGGATACCAAAGTACTGAAGACTCGTGGTTATGTATAGTGCCTTCCTACTTTCCCTAGCCTAGTAACAGAAGCTCTCCCCTTTGGAGCAAAAACATAAGTTAAAAGTAA 

HLEEAYGFMTSEHOY I SRKOERDR I I VFERGNLVFVFNFH 

GGACTAGCAGCTATTCGGATT ACCGAGTTGGCTGCTTAAAGCCAGGAAA 

CCTGATCGTCGATAAGCCTAATGGCTCAACCGACGAATTTCGGTCCTTTCATGTTCTATCAGAACCTAAGTCTACTAGGAAACAAACCTCCGAAACCGTCCGAATCAGTACTACGTCTCG 

W 3YSDYRVGCLKPGKYKIVLDSDDPLFGGFGRLSHDAE 

ACTTCAGCTTTGAAGGGTGGTACGATAACCGGCCTCGATCCTTCATGGTGTACA^ ^ q 
TGAAGTCGAAACTTCCCACCATGCTATTGGCCGGAGCTAGGAAGT ACCAC ATGTGTGGTACATCTTGTCGTCACCAGATACGAAATCACCTCCTACTTCACCTCTTACTTAACCTTGGAC 

HFSFEGWYONRPRSFHVYTPCRTAVVYALYEDEVENELEP 
TCGCCGGTTAAGATATATCTTAACAACAGGTTCTGAAGCAGGAATGCCATTATTGATCTTCCTATGTT 



AGCGGCCAATTCTATATAGAATTGTTGTCCAAGACTTCGTCCTTACGGTAATAACTAGAAGGATACAA 



25B8 
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f-60 *70 4"80 +-90 ^100 fllO +-120 

125*94. seq TAGTTTTGGGTACCATGTCACAAACTTTTTTGCACCTAGCAGCCGATTTGGAACTCCTGATGATTTGAAG 
TAGTTTTGGGTA CA GTC AC AAACTTTT TGCA CTAGCAGCCGATTTGGAACTCCTGATGATTT AAG 

1 16. seq TAGTTTTGGGTATCACGTCACAAACTTTTATGCAGCTAGCAGCCGATTTGGAACTCCTGATGATTTAAAG 
M140 ^1 150 ^1 160 ^1 170 ^1 180 *1190 M200 

+-130 *140 4-150 *-160 +-170 +-180 f190 

125*94 seq TCTTTAATAGATAAAGCTC ATGAGTT AGGGCTGCTTGTTCTCATGGAT ATTGTTCATAGCC ATGCGTCAA 
TCI TAATAGATAAAGCTCA GAGTTAGG CT CTTGTTCTC ATGG AT ATTGTTCATAGCC ATGC TCAA 

1 16. seq TCTCTAATAGATAAAGCTCACGAGTTAGGTCTTCTTGTTCTCATGGATATTGTTCATAGCCATGCATCAA 
*1210 ^1220 M230 M240 *1250 M260 *1270 

+-200 +-2 10 f220 +-230 +-240 +-250 +-260 

125+94. seq ATAATACGTTGGATGGGCTGAACATGTTTGATGGTACGGATAGTCACTACTTCCACTCCGGATCACGGGG 
TAATACGTTGGATGGGCTGAA ATGTTTGATGGTACGGAT GTCACTACTT CACTC GGA CACGGGG 

1 16. seq CT AATACGTTGGATGGGCTGAATATGTTTGATGGTACGGATGGTCACTACTTTCACTCTGGACCACGGGG 
M280 M290 M300 *13tO M320 <M330 *-1340 

+-270 +-280 #-290 ^300 *310 >r320 4^330 

125*94. seq TC ATC ATTGGTTGTGGGACTCTCGCCTTTTCAACTATGGAAGCTGGGAGGTGCTAAGATTTCTTCTTTCA 
TCATC ATTGG TGTGGGACTC CGCCTTTTC AACTATGG AGCTGGGAGGT CTAAG TTTCTTCTTTCA 

1 16 seq TCATC ATTGG ATGTGGGACTCC CGCCTTTTC A ACT ATGGGAGCTGGGAGGTTCTAAGGTTTCTTCTTTCA 

M350 M360 ^1370 M380 M390 ^1400 <M410 

+-340 4"350 4-360 ^370 +-380 ^390 +-400 

125*94 seq AATGCA AGATGGTGGTTGGAAGAGTAC AGGTTTGATGGTTTTAGATTTGATGGGGTGACTTCCATGATGT 
AATGCAAG TGGTGGTTGGA GAGTACA GTTTGATGG TT AGATTTGA GGGGTGACTTC ATGATGT 

1 16 seq AATGCAAGGTGGTGGTTGGATGAGTACAAGTTTGATGGGTTCAGATTTGACGGGGTGACTTCAATGATGT 
*-1420 *1430 M440 M450 M460 *1470 ^1480 

4-410 +-420 +-430 +-440 +-450 +-460 +-470 

125*94. seq ACACTCCCCATGGGTTGC AGGTAGCTTTTACTGGCAACTACAATGAGTACTTTGGATATGCAACTGATGT 
ACAC C CATGG TTGCAGGTAG TTTTAC GGC AACTAC AATGA T ACTTTGGATATGCAACTGATGT 

1 16 seq AC AC C C ATC AT GG AT TGCAGGTAGATTTTACC GGC A ACT AC AATGA AT ACTTTGGATATGCAACTGATGT 
^1490 ^1500 *-1510 ^1520 - ^1530 M540 <M550 

+-480 4-490 4-500 +-5 10 4-520 +-530 +-540 

125*94. seq AGATGCTGTGATTTATTTGATGCTTGTGAATGATATGATTCACGGTCTTTTCCCTGAGGCTGTTACCATT 
AGATGCTGTG TTT ATTTGATGCT TGAATGATATGATTCA GGTCT TTCCC GAGGCTGT ACCATT 

1 16. seq AGATGCTGTGGTTTATTTGATGCTGTTGAATGATATGATTCATGGTCTCTTCCCAGAGGCTGTCACCATT 
^1560 ^1570 M580 *-1590 *1600 M610 ^1620 
4-550 4-560 4-570 4-580 +-590 ^600 +-6 10 

125*94. seq GGTGAAGATGTTAGCGGAAAGCCAACATTTTGCATTCCAGTGGAAGATGGTGGTGTTGGATTTGATTACC 
GGTGAAGATGTTAG GGAA GCCAACA TTTGCATTCC GT GAAGATGGTGGTGTTGG TTTGATTA C 

1 16 seq GGTGAAGATGTT AGTGGAATGCCAACAGTTTGCATTCCGGTTGAAGATGGTGGTGTTGGCTTTGATTATC 

M630 ^1640 M650 <M660 ^1670 M680 ^1690 

4-620 4-630 4-640 +-650 r660 ^670 ^680 

125*94. seq GTCTCC AC ATGGCCATTGCCGATAAATGGATTGAGATTCTTAAGAAGAGAGATGAGGACTGGAAAATGGG 
GTCTCC ACATGGC TTGC GATAAATGG TTGAGATT TT AGAAGAGAGATGA GA TGGAAAATGGG 

1 16. seq GTCTCC ACATGGCTGTTGCTGAT AAATGGGTTGAGATTATTCAGAAGAGAGATGAAGATTGGAAAATGGG 
M700 ^1710 ^1720 *-1730 M740 ^1750 *1760 

+-690 +700 4-710 4-720 4-730 4-740 +750 

125*94. seq TGACATTGTGCATACACTC ACCAACAGAAGGTGGTTGGAAAAATGTGTTGCTTATGCTGAAAGTCATGAC 
TGACATTGT CATA CT ACCAACAG GGTGGTTGGAAAA TGTGTT CTTATGCTGAAAGTC ATGAC 
116. seq TGAC ATTG T AC AT ATGC TGACC A AC AGGCGGTGGTTGGA A A AGTGTGTTT CTTATGCTGAAAGTC ATGAC 
^1770 M780 *1790 *-1800 M810 *-1820 M830 

+760 4-770 +780 +790 4-800 +-8 10 +-820 

125*94. seq C AAGCTCTTGTTGGTGACAAAACTATTGC ATTTTGGCTGATGGACAAGGAC ATGTACGACTTCATGGCTC 
CA GC CTTGTTGGTGACAAAACTATTGCATTTTGGCTGATGGACAAGGA ATGTA GACTTCATGGCTC 
1 16. seq C AGGCCCTTGTTGGTGAC AAAACTATTGCATTTTGGCTGATGGACAAGGATATGTATGACTTCATGGCTC 
♦-1840 *-1850 ^1860 *-1870 M880 M890 M900 

+-830 4-840 4-850 +-860 4-870 4^880 +-890 

125*94 seq GTGAC AGACC ATCTACTCCTCTTATAGATCGTGG AAT AGC ATTGC ACAAAATGATCAGGCTTATTACC AT 
TGACAGACCATCTAC CCTCT ATAGATCGTGGA T AGCATTGC AC AAAATGATCAGGCTTATTACC AT 
1 16. seq TTGAC AGACC ATC TACCCCTCT CAT AG ATC GTGG-AGT AGC ATTGC AC AAAATGATCAGGCTTATTACC AT 
M910 M920 ^1930 M940 *-1950 *1960 M970 

+-900 4-910 4-920 +-930 +-940 +-950 +-960 

125*94. seq GGGCTTAGGCGGAGAAGGATATTTGAATTTTATGGGAAATGAATTTGGACATCCTGAGTGGATTGATTTT 
GGG TT AGGCGGAGAAGGAT ATTTGAATTTT ATGGGAAATGAATTTGGAC A CC GAGTGGATTG ATTTT 
1 16. seq GGG ATT AG GCGG AG A AGG AT ATTTGAATTTT AT GGG A A ATG A ATTTGG AC AC CCC GAGTGGATTG ATTTT 
*-1980 M990 ^-2000 *-2010 ^2020 ^2030 *2040 
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Fig.5 (Cont). 

4-970 4-980 f990 *-1000 ^1010 ^1020 *-1030 

125*94. seq C C AAG A G GG G ATC G AC AT C TGC C C AATGGT A A AGT A ATTCCAGGG AAC A AC C ACAGTTATGATAAATGCC 

CCAAGAGG GATC ACATCT CCCA TGGTAAA T TTCC GGGAACAA ACAGTTATGATAAATGCC 
1 16. seq C C AAG AG GT G ATC T AC AT C TT CC CAG T GGT A A ATT T GT TC CI GGG A AC A ATT ACAGTTATGATAAATGCC 

^2050 *2060 *2070 ^2080 ^2090 *2100 ^2110 

f1040 4*1050 4*1060 4-1070 4-1080 4-1090 *-1 100 

125*94. seq GTCGTAGATTTGATCTAGGTGATGC AGACT ATCTAAG ATATCATGGAATGCAAGAGTTTGATCAGGCAAT 

G CGTAG TTTGATCTAGG AT CA A ATCT AGAT ATC ATGGAATGCAAGAGTTTGATCA GCAAT 
116. seq GGCGTAGGTTTGATCTAGGC AATTCAAAGC ATCTGAGATATCATGGAATGCAAGAGTTTGATCAAGCAAT 

^2120 ^2130 ^2140 ^2150 *2160 ^2170 ^2180 

4-1110 ^1 120 4-1130 4-1140 ^1 150 4^1160 ^1 170 

125*94. seq GC AAC ATCTTGAAGAAGCCTATGGTTTC ATGACTTCTGAGCACCAGTATATATCACGGAAGGATGAAGGA 

CA CATCTTGAAGAAGCCTATGGTTTCATGACTTCTGAGCACCA T A ATATCACGGAAGGATGAA G 
1 16. seq TCAGC ATCTTGAAGAAGCCTATGGTTTC ATGACTTCTGAGCACC AATAC ATATCACGGAAGGATGAAAGG 

^2190 ^2200 ^2210 ^2220 *2230 *2240 ^2250 

4-1180 4*1190 4-1200 +-1210 f1220 4-1230 4^1240 

125*94. seq GATC G GATC ATTGTCTTTG AG AGGGGAAACCTTG TTTTTGTATTC AAC TTTCATTGG ACT AAC AGCT ATT 

GATCGGATCATTGTCTT GAGAGGGGAAACCT GTTTTTGTATTCAA TTTCATTGGACTA CAGCTATT 
116. seq GATCGGATC ATTGTCTTCGAGAGGGGAAACCTCGTTTTTGTATTCAATTTTCATTGGACTAGCAGCTATT 

^2260 ^2270 ^2280 ^2290 ^2300 ^23 10 ^2320 

^1250 4-1260 4-1270 +M280 ^1290 4-1300 4-1310 

125*94. seq C AGATTACCGAGTTGGCTGCTTCAAGTCAGGAAAGTACAAGATTGTTTTGGACTCGGATGATGGCTTGTT 

C GATTACCGAGTTGGCTGCTT AAG CAGGAAAGTACAAGAT GT TTGGA TC GATGAT TTGTT 
1 16. seq CGGATTACCGAGTTGGCTGCTTAAAGCCAGGAAAGTACAAGATAGTCTTGGATTCAGATGATCCTTTGTT 

^2330 ^2340 ^2350 ^2360 ^2370 ^2380 ^2390 

4-1320 4-1330 4-1340 4-1350 4-1360 <H370 ^-1380 

125*94. seq TGGAGGCTTCAACAGGCTTAGTCATGATGCCGAGCACTTCACCTTTGACGGGTGGTATGAT AACCGGCCT 

TGGAGGCTT CAGGCTT AGTCATGATGC GAGCACTTCA CTTTGA GGGTGGTA GATAACCGGCCT 
1 16. seq TGGAGGCTTTGGCAGGCTTAGTCATGATGCAGAGC ACT TCAGC TTTGAAGGGTGGTACG AT AACCGGCCT 

^2400 ^2410 ^2420 ^2430 ^2440 ^2450 ^2460 

4-1390 4-1400 4-1410 4-1420 4-1430 f 1440 +-1450 

125*94. seq CGGTCCTTC ATGGT ATATGCACCATCTAGGACAGCAGTGGTCCATGCTTTAGTAGAAGATGAAG 

CG TCCTTC ATGGT TA CACCAT TAG ACAGCAGTGGTC ATGCTTTAGT GA GATGAAG 
1 16. seq CGATCCTTC ATGGTGTACACACCATGTAGAACAGCAGTGGTCTATGCTTTAGTGGAGGATGAAG 

*-2470 <-2480 ^2490 ^2500 ^2510 ^2520 ^2530 
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Fig.6. 



<r10 ^20 ^30 f40 *-50 ^60 /70 

125-94 pro SFGYHVTNFFAPSSRFGTPDDLKSL I DK AHELGLLVLMD I VHSHASNNTLDGLNMFDGTDSHYFHSGSRG 
SFGYHVTNF: A: SSRFGTPDDLKSL I DK AHELGLLVLMD I VHSHAS. NTLDGLNMFDGTD: HYFHSG: RG 
1 16 pro SFGYHVTNFYAASSRFGTPDDLKSL I DK AHELGLLVLMD I VHSHASTNTLDGLNMFDGTDGHYFHSGPRG 
^370 ^380 *-390 *400 ^410 *-420 M30 

^80 ^90 flOO *110 *-120 ^130 f140 

125-94 pro HHWLWDSRLFNYGSWEVLRFLLSNARWWLEEYRFQGFRFDGVTSMMYTPHGLQVAFTGNYNEYFGYATDV 
HHW- WDSRLFNYGSWEVLRFLLSNARWWL: EY: FDGFRFDGVTSMMYT. HGLQV. FTGNYNE YFGYATDV 
1 16 pro HHWMWDSRLFNYGSWEVLRFLLSNARWWLDEYKFDGFRFDGVTSMMYTHHGLQVDFTGNYNEYFGYATDV 
^440 ^450 HGO *470 ^480 *490 *500 

*r150 ^160 ^170 *-180 €190 *200 *-210 

125-94 pro DAV I YLML VNDM I HGLFPEAVT I GEDV5GKPTFC I PVEDGGVGFDYRLHMA I ADKWI E I LKKRDEDWKMG 
DAV YLML: NDM I HGLFPEAVT I GEDVSG. PT C I PVEDGGVGFDYRLHMA: ADKW: E3: : KRDEDWKMG 
116 pro DAVVYLMLLNDM 1 HGLFPEAVT I GEDVSGMPTVC I PVEDGGVGFDYRLHMAVADKWVE 1 1 OKRDEDWKMG 
^510 *-520 *-530 ^540 ^550 *560 *-570 

^220 *230 ^240 ^250 ^260 ^270 *280 

125-94 pro D I VHTLTNRRWLEKCV AYAESHDQALVGDKT I AFWLMDKDMYDFMARDRPSTPL I DRG1 ALHKM I RL I TM 
DIVH LTNRRWLEKCV: YAESHDQALVGDKT I AFWLMDKDMYDFMA DRPSTPL I DRG: ALHKMIRL ITM 
1 16 pro D I VHMLTNRRWLEKCVS YAESHDQALVGDKT I AFWLMDKDMYDFMALDRPSTPL I DRG V ALHKM I RL I TM 
^580 ^590 *600 *610 ^620 *630 *-640 

^290 *300 <r310 ^320 *330 ^340 *-350 

125-94 pro GLGGEGYLNFMGNEFGHPEWI DFPRGDRHLPNGKV I PGNNHSYDKCRRRFDLGDADYLRYHGMQEFDQAM 
GLGGEGYLNFMGNEFGHPEWI DFPRGD HLP: GK : PGNM. SYDKCRRRFDLG: : . . LRYHGMQEFDQA: 
116 pro GLGGEGYLNFMGNEFGHPEWI DFPRGDLHLPSGKF VPGNNYSYDKCRRRFDLGNSKHLRYHGMQEFDGA I 
*650 ^660 *670 ^680 *690 *700 *710 

^360 ^370 *3B0 ^390 ^400 ^410 ^420 

125-94 pro QHLEEAYGFMTSEHQY I SRKDEGDR I I VFERGNLVFVFNFHWTNSYSDYRVGCFKSGKYK I VLDSDDGLF 
GHLEEAYGFMTSEHQY I SRKDE DR I I VFERGNLVFVFNFHWT: SYSOYRVGC: K: GKYK I VLDSDD LF 
1 16 pro QHLEEAYGFMTSEHQY I SRKDERDR I I VFERGNLVFVFNFHWTSSYSDYRVGCLKPGKYK I VLDSDDPLF 
P *720 ^730 *740 ^750 ^760 *770 ^780 

^430 ^440 ^450 <-460 *470 

125-94 pro GGFNRLSHDAEHFTFDGWYDNRPRSFMVYAPSRTAVVHALVEDEENEAENEVES 
GGF RLSHDAEHF: F: GWYDNRPRSFMVY: P. RTAVV. ALVEDE : : . : V. : 
1 16 pro GGFGRLSHDAEHFSFEGWYDNRPRSFMV YTPCRT AW Y ALVEDE VENEVEPV AG 

^790 ^800 ^810 *-820 ^830 
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Fig.9. 



Bel ) NCO I 



400 



AT6GACAAGGATATGTATGACTTCATGGCTCTTGACAGACCATCTACTCCTCTCATAGATCGTGGAGTAECATTGCACAAAATGATCAGGCTTATTACCA — 

f i I 1 i ' I ! ' 1 ' 1 ' ! 1 » ! ' h 100 

TACCTGTTCCTATACATACTGAAGTACCGAGAACTGTCTGGTAGATGAGGAGAGTATCTAGCACCTCATCGTAACGTGTTTTACT AGTCCGAATAATGGT 

MDKDMYDFMALORPSTPLIDRGVALHKMIRLIT 

TGGGATTAGGCGGAGAAGGATATTTGAATTTTATGGGAAATGAATTTGGACACCCCGAGTGGATTGATTTTCCAAGAGGTGATCTACATCTTCCCAGTGG 

, i i i i . i \ ■ i 1 I 1 * ' 1 ' 1 1 1 i > i 1 h 200 

ACCCTAATCCGCCTCTTCCTAT AAACTT AAAATACCCTTTACTTAAACCTGTGGGGCTCACCTAACTAAAAGGTTCTCCACTAGATGTAGAAGGGTCACC 

MGLGGEGYLNFMGNEFGHPEWtDFPRGDLHLPSG 

EcoR V Bd I 

TAAATTTGTTCCTGGGAACAATTACAGTTATGATAAATGCCGGCGTAGGTTTGATCTAGGCAATTCAAAGCGTCTGAGATATCATGGAATGCAAGAGTTT 

, , . 1 h 1 1 1 \ . i 1 1 1 1 i 1 1 h 300 

ATTTAAACAAGGACCCTTGTTAATGTCAATACTATTTACGGCCGCATCCAAACTAGATCCGTTAAGTTTCGCAGACTCTATAGTACCTTACGTTCTCAAA 

KFVPGNNYSYDKCRRRFDLGNSKRLRYHGMQEF 

GATCAAGCAATTCAGCATCTTGAAGAAGCCTATGGTTTCATGACTTCTGAGCACCAATACATATCACGGAAGGATGAAAGGGATCGGATCATTGTCTTCG 

I 1 1 i 1 l ' i ' 1 ■ 1 ' 1 ' i » 1 1 I 

CTAGTTCGTTAAGTCGTAGAACTTCTTCGGATACCAAAGT ACTGAAGACTCGTGGTTATGTATAGTGCCTTCCTACTTTCCCTAGCCTAGTAACAGAAGC 

OQA I QHLEEAYGFMTSEHQY I SRKOERDR I I V F 

AGAGGGGAAACCTCGTTTTTGTATTCAATTTTCATTGGACTAGCAGCTATTCGGATTACCGAGTTGGCTGCTTAAAGCCAGGAAAGTACAAGATAGTCTT 

I I 1 1 1 1 ' 1 I'ti i 1 H 1 ' ' I 500 

TCTCCCCTTTGGAGCAAAAACAT AAGTTAAAAGTAACCTGATCGTCGATAAGCCTAATGGCTCAACCGACGAATTTCGGTCCTTTCATGTTCTATCAGAA 

ERGNLVFVFNFHWTSSYSDYRVGCLKPGKYK I VL 

GGATTCAGATGATCCTTTGTTTGGAGGCTTTGGCAGGCTTAGTCATGATGCAGAGCACTTCAGCTTTGAAGGGTGGTACGATAACCGGCCTCGATCCTTC 

i ■ ■ | i i ■ ■ ■ 1 1 i 1 ■ 1 1 ' ■ I 600 

CCTAAGTCTACT AGGAAACAAACCTCCGAAACCGTCCGAATCAGTACTACGTCTCGTGAAGTCGAAACTTCCCACCATGCTATTGGCCGGAGCTAGGAAG 

DSDDPIFGGFGRLSHDAEHFSFEGWYDNRPRSF 

AT6GTGTACACACCATGTAGAACAGCAGTGGTCTATGCTTTAGTGGAGGATGAAGTGGAGAATGAAGTGGAACCTGTCGCCGGTTAAGATATATCTTAGC 

h-a j , 1 . 1 1 i ■ I ■ ■ ' I 1 ' 1 < 1 1 K 700 

TACCACATGTGTGGTACATCTTGTCGTCACCAGATACGAAATCACCTCCTACTTCACCTCTTACTTCACCTTGGACAGCGGCCAATTCTATATAGAATCG 

MVYTPCRTAVVYALVEDEVENEVEPVAG. 

AACAGGTTCTGAAGCAGGAATGCCATTATTGATCTTCCTATGTGCATCTGCGTTGAACGAAATATATTGAGCCTATAATTTGATGTCACGGTCCTTGCAG 

, 1 , , , t . 1 1 » i i 1 i > i 1 1 ' H BOO 

TTGTCCAAGACTTCGTCCTTACGGTAATAACTAGAAGGATACACGTAGACGCAACTTGCTTTATATAACTCGGATATTAAACTACAGTGCCAGGAACGTC 

ATTTCCATCCTGGTTCTTGGTATTTTGTTGTCATGATAAACATAATCAAAGACCAATAGGAAACGCAGGGTTACATGCTAGCTTCCATCATCATAGGGAG 

, ( , f , 1 , i . i 1 i , 1 . ! 1 1 1 h 900 

TAAAGGTAGGACCAAGAACCATAAAACAACAGTACTATTTGTATTAGTTTCTGGTTATCCTTTGCGTCCCAATGTACGATCGAAGGTAGTAGTATCCCTC 

Sac I 

CTCAGACCTCCTAAACCATAAATCTTCAAGCTGCCTGCGTTCGGTAGTATGTTATGTGGT ACTTTGCAATCTTAAATTATCATGATCGCTGTGGATGCTA 



GAGTCTGGAGGATTTGGTATTTAGAAGTTCGACGGACGCAAGCCATCATACAATACACCATGAAACGTTAGAATTTAATAGTACTAGCGACACCTACGAT 

ACTATGACAATTTTGTATATATGCCAACGAGGATTTTAAGTTTTAAAAAAAAAACAAAAAAAATCC ATG 

, ( , 1 , i , 1 . 1 1 . * 1069 

TGATACTGTT AAAACAT ATATACGGTTGCTCCTAAAATTC AAAATTTTTTTTTTGTTTTTTTTAGGT AC 
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Fig. 10. 



400 



500 



Cla I Kpn J 

TATGGATTGACATCGATAAT ACGACTCACTATAGGGATT7 7 TTTTTTTTTTTTTTTTTGT AGTTTTGGGT ACCATGTC ACAAACTTTTTTGC ACC7AGCA 

I . 1 . 1 i 1 ■ 1 ■ ! i ■ 1 — 1 ■ 1 1- 100 

ATACCTAACTGTAGCT ATTATGCTGAGTGATATCCCTAAAAAAAAAAAAAAAAAAAAACATCAAAACCCATGGTACAGTGTTTGAAAAAACGTGGATCGT 

SFGYHVTNFFAPS 

GCCGATTTGGAAC T CC TGATGATT TG AAGTCTTT AATAGAT AAAGCTCATGAGTT AGGGCTGCTTGTTCTCATGGATATTGTTCATAGCCATGCGTCAAA 

1 ii | , i l i 1 1 " 1 H 1 1 1 , h 200 

CGGCTAAACCTTGAGG ACTACT AA ACTTCAGAAATT ATCT ATTTCGAGTACTCAATCCCGACGAAC AAGAGTACCTAT AACAAGTATCGGTACGCAGTTT 

5RFGTPD0LKSL 1 DKAKELGLLVLMD 1 VKSHASN 

TAATACGTTGGATGGGCTGAACATGTTTG ATGGT ACGGAT AGTC ACTACTTCCACTCCGGATC ACGGGGTCATCATTGGTTGTGGGACTCTCGCCTTTTC 

, 1 , , , 1 , 1 «h > 1 i ■ ■ ■ ■ i 1 1 1 h 300 

ATTATGCAACCT AC CC GACT TGT AC A AACTACC AT GCCTATCAGTGATGAAGGTGAGGCCTAGTGCCCCAGTAGTAACCAACACCCTGAGAGCGG AAA AG 

NTLDGLNHFDGTDSHYFHSGSRGHHWLWDSRLF 

AACTATGGAAGCTGGGAGGTGC T AAGAT TTCTTCTTTC AAATGC AAGATGGTGGTTGGAAGAGTACAGGTTTG ATGGTTTT AGATTTGATGGGGTGACTT 

, I i I i | , i 1 H 1 i i I i i i 1 ■ 1 ■ h 

TTGATACCTTCGACCCTCCACGATTCT AAAGAAGAAAGTTT ACGTTCTACCACC AACCTTCTC ATGTCCAAACTACC AAAATCTAAACTACCCCACTGAA 

NYGSWEVLRFLLSNARWWLEEYRFDGFRFDGVT 

Nco I Sea t 

CCATGATGTACACTCCCCATGGGTTGCAGGTAGCTTTTACTGGCAACTACAA TGAGTACTTTGGATATGCAACTGATGTAGATGCTGTGATTTATTTGAT 
GGTACTACATGTGAGGGGTACCCAACGTCCATCGAAAATGACCGTTGATGTTACTC ATGAAACCTATACGTTGACT ACATCT ACGACACT AAATAAACTA 

SMMYTPHGLOVAFTGNYNEYFGYATOVDAV I Y L M 

GCTTGTGAATGATATGATTCACGGTCTTTTCCCTGAGGCTGTTACCATTGGTGAAGATGTTAGCGGAAAGCCAACATTTTGCATTCCAGTGGAAG ATGGT 

■ , .,, „. | , h ■ 1 ' 1 ) ■ ■ 1 r h — ■ — 4 • i ' h 600 

CGAACACTTACT AT ACTAAGTGCCAGAAAAGGGACTCCGACAATGGTAACCACTTCTACAATCGCCTTTCGGTTGT AAAACGT AAGGTCACCTTCTACCA 

LVNDMI HGLFPEAVT IGEDVSGKPTFC IPVEOG 

GGTGTTGGATTTGATTACCGTCTCCACATGGCCATT GCCGAT AAATGGATTGAGATTCTT AAGAAGAGAGATGAGGACTGGAAAATGGGTGACAT TGTGC 

i , j , 1 . — , 1 1 1 . h 1 1— — i i.i.i p h 700 

CCACAACCT AAACT AATGGC AGAGGTGT ACCGGT AACGGCT ATTTACCTAACTCT AAGAATTCTTCTC TCTACTCCTGACCTTTT ACCCACTGT AACACG 

GVGFDYRLHMA IADKWIE I LKKROEDWKHGD I V 

ATACACTCACCAAC AG AAGGTGGTTGGAAAAATGTGTTGCTT ATGCTGAAAGTC ATGACC AAGCTCTTGTTGGTGACAAAACT ATTGC ATTTTGGCTGAT 

, j , 1 , , a ■ . i ■ ■ i i 1 ■ ■ . 'l 1 1 » ' — t- 

TATGTGAGTGGTTGTCTTCC ACCAACCTTTTTAC ACAACGAATACGACTTTCAGTACTGGTTCGAGAACAACCACTGTTTTGATAACGTAAAACCGACTA 

HTLTNRRWLEKCVAYAESHOOALVGDKT I AFWLM 

Bel i NCO I 

GGACAAGGACATGTACGACTTCATGGCTCGTGACAGACCATCTACTCCTCTTATAGATCGTGGAATAGCATTGCACAAAATGATCAGGCTTATTACCATG 

, 1 , 1 , , | , i . ! . 1 1 1 1 • K 900 

CCTGTTCCTGTACATGCTGAAGTACCGAGCAC TGTCTGGT AGATGAGGAGAATATCTAGCACCTTATCGT AACGTGTTTT ACTAGTCCGAAT AATGGT AC 

DKDMYOF MARDRPSTPL I DRG1 ALHKM I RL I T M 

GGCTTAGGCGGAGAAGGATATT TGAATTTT ATGGGAAATGAATTTGGACATCCTGAGTGGATTGATTTTCCAAGAGGGGATCGACATCTGCCCAATGGTA 

i , , ■ . 1 . 1 . 1 > 1 I 1000 

CCGAA TCCGCCTCTTCCT AT AA AC TT AAAATACC C TT T AC TT AAAC CTGT AGGAC TC ACC T AAC T AAAAGGTTCTC CCCT AGC TGT AGACGGGTT ACC AT 

GLGGEGY LNFMGNEFGHPEW1 DFPRGDRHLPNG 
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Fig. 10 (Cont). 
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EcoR V 3d I 

AAGTAATTCCACGGAACAACCACAGTTATGATAAATGCCGTCGTAGATTTGATCTAGGTGATGCAGACTATCTAAGATATCATCGAATGCAAeAGTTTGA 
, ■ l . 1 1 i ' ^ ' 1 1 1 i 1 . 1 1 h 

TTCATTAAGGTCCCTTGTTGGTGTCAATACTATTTACGGCAGCATCTAAACTAGATCCACTACGTCTGATAGATTCTATAGTACCTTACGTTCTCAAACT 
KV [PGNNHSYDKCRRRFDLGDADYLRYHGMQEFD 

TCAGGCAATGCAACATCTTGAAGAAGCCTATGGTTTCATGACTTCTGAGCACCAGTATATATCACGGAAGGATGAAGGAGATCGGATCATTGTCTTTGAG 

.ii i . i I 1 ' l ' i t ■ 1 ■ 1 i i ■ , ) i ■ i 

AGTCCGTTACGTTGTAGAACTTCTTCGGATACCAAAGTACTGAAGACTCGTGGTCATATATAGTGCCTTCCTACTTCCTCTAGCCTAGTAACAGAAACTC 
QAMGHLEEAYGFMTSEHQY I SRKDEGOR I 1VFE 

AGGGGAAACCTTGTTTTTGTATTCAACTTTCATTGGACTAACAGCTATTCAGATTACCGAGTTGGCTGCTTCAAGTCAGGAAAGTACAAGATTGTTTTGG 
— • ' 1 (— . 1 1 i 1 i . i 1 1 . 1 , 1 , (- 1300 

TCCCCTTTGGAACAAAAACATAAGTTGAAAGTAACCTGATTGTCGAT AAGTCTAATGGCTCAACCGAC6AAGTTCAGTCCTTTCATGTTC7AACAAAACC 

RGNLVFVFNFHWTNSYSDYRVGCFKSGKYK I VL 

ACTCGGATGATGGCTTGTTTGGAGGCTTCAACAGGCTTAGTCATGATGCCGAGCACTTCACCTTTGACGGGTGGTATGAT AACCGGCCTCGGTCCTTCAT 

1 ' l ■ I ■ I ' I 1 — i 1 I ■ > I 1 1 h 1400 

TGAGCCTACTACCGAACAAACCTCCGAAGTTGTCCGAATCAGTACTACGGCTCGTGAAGTGGAAACTGCCCACCATACTATTGGCCGGAGCCAGGAAGTA 

DSDDGLFGGFNRLSHDAEHFTFDGWYDNRPRSFM 

GGTATATGCACCATCTAGGACAGCAGTGGTCCATGCTTTAGTAGAAGATGAAGAGAATGAAGCAGAGAATGAAGTAGAAAGTGAAGTGAAACCAGCCTCC 

1 ) , f I 1 1 1 1 1 ! 1 i 1 1 . i . h 1500 

CCATATACGTGGTAGATCCTGTCGTCACCAGGTACGAAATCATCTTCTACTTCTCTTACTTCGTCTCTTACTTCATCTTTCACTTCACTTTGGTCG6AGG 

VYAPSRTAVVHALVEDEENEAENEVESEVKPAS 

BamH I Hinc II 

GGCTGAGATAGATATTTAGTAAGAGGATCCCCTAAAGCAGGAATGGTTAACCTGTGCATCTGCATTGAACGACGTATATTGAGACTTGAATTGATTTGCT 

1 1 , 1 . 1 < I 1 1 1 1 1 1 1 ! . 1 ' i 1600 

CCGACTCTATCTATAAATCATTCTCCTAGGGGATTTCGTCCTTACCAATTGGACACGTAGACGTAACTTGCTGCATATAACTCTGAACTTAACTAAACGA 



G 



Ssp I 



Nsil 
Bell 



GCTCAGGACACAGAATATTAATTCCAAGGCTCAAGGCAGAGATACACGCCATAATGCATGATCATATGAAAGCTCCCCAACTTGTAAATCATTTAGCAAG 
, ( , ■ t 1 , 1 , h . 1 . 1 . 1 . 1 . 1- 1700 

CGAGTCCTGTGTCTTATAATTAAGGTTCCGAGTTCCGTCTCTATGTGCGGTATTACGTACTAGTATACTTTCGAGGGGTTGAACATTTAGTAAATCGTTC 

Sea I Nco I 

CTGCGTGCACTCTGTAAATTATATGTAGTACTTTGGCAAGTCACGTTATTATGGATACCATGGATGTCCGCTAGGAAAAATTTTGTGTATACGCCTACTA 

1 1 < I 1 1 ■ ■ ■ I 1 1 . 1 1 1 1 1 i « 1 I 1800 

GACGCACGTGAGAC ATTTAATATACATCATGAAACCGTTC AGTGCAAT AATACCTATGGTACCTACAGGCGATCCTTTTTAAAACACATATGCGGATGAT 

Xmnl 

GGATTTTTAAATCTCGCATGTTCCACAT AAAGTGGTGGTTGAATGTTGCGCGACTATTTTTGAGTAAAATGATTGAAGTT ATTCTTCACTTGGGCCTGTG 

1 1 f ■ 1 ' 1 ' 1 ' 1 ' i « 1 1 i 1 h 1900 

CCTAAAAATTTAGAGCGT ACAAGGTGTATTTCACCACCAACTTACAACGCGCTGATAAAAACTCATTTTACT AACTTCAAT AAGAAGTGAACCCGGAC AC 

AAAAAAAAAAAAAAAAAAA 
, ) , ^ 19l g 
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Fig. 13. 



AGTCAArTCGAGCTCGGTACCCGGGGArCCGATTCGCATTTCTCGCTATTGCTTTCCGTTTATTTCCATATATAAAATATCAAATCTAATCACTrCCGCCATTTCTATCTCTCTCCAAAC 

' ' " i I ■ ' * " i ■ l l ■ f * t ' I J20 

TCACTTAAGCTCGAGCCATGGGCCCCTAGGCTAAGCGTAAAGAGCGATAACGAAAGGCAAATAAAGGrATATATTTTATAGTTTAGATTAGTGAACCCGGTAAAGATAGAGAGAGGTTTG 

Ncol 

TCTCACCGAAATGGTATACTACACTCTATCACGCATACGTTTTCCrTGTGCACCTrCACTCTACAAArCTCAGCTCACCAGCTTCCATGGCGCrCGAAGGACCrCTTCTGGCCTTrCCTT 



360 



AGAGTGGCTTTACCATATGATGTGACATAGTCCGTATGCAAAAGGAACACGTGGAAGTGAGATGTTTAGAGTCGAGTGGrCGAAGGTACCGCCAGCTTCCTGGAGAAGACCGGAAAGGAA 
ttVYYTVSGIRFPCAPSLYKSQLTSFHGGRRTSSGLSF 

CCTCTTCAASAACGAGCTGTTTCCTCGGAAGATCTTTGCTGSAAAGtCCTCTTATGAATCTGACTCCT CAAA TTTAACTGTCTCTGCATCTGAGAAGGT CCTrSTTCCTGATGATCAGAT 
GGAGAACTTCTTCCTCGACAAAGGAGCCTTCTAGAAACGACCTTTCAGGAGAATACTTAGACTGAGGAGTTTAAATTGACAGAGACGTAGACTCTTCCAGGAACAAGGACTACTAGTCTA 

LLKKELFPRK IFAGKSSYESDSSMLTVSASEKVLVPDDa t 

BstX I 

TGATGGCTCTTCTTCTTCAACArATCAATrAGAAACCACTGGCACAGTTTTGGAGGAATCCCAGGTTCTTGGTGATGCAGAGAGTCTTGTGATGGAAGATCATAAGAATGTTGAGGAGGA 

t ■ . ) 1 ■ t 1 ■ ■ t ■ ■ . i 1 1 *■ <tB0 

ACTACCGAGAAGAAGAAGTTGTATAGTTAATCTTTGGTGACCGTGTCAAAACCTCCTTAGGGTCCAAGAACCACTACGTCTCTCAGAACACTACCrTCTACTATTCTTACAACTCCTCCT 

QGSSSSTYQLETTGTVLEESOVLGOAESLVttEDDKNVEED 

Hind ttl 

TGAAGTAAAAAAAGAGTCGGTTCCATTGCATGAGACAATTAGCATTGGAAAAAGTGAATCTAAACCAAGGTCCATTCCTCCACCTGGCAGTGGGCAGAGAATATATGACATAGATCCAAG 



ACTTCATTTTTTTCTCAGCCAAGGTAACGTACTCTGTTAATCGTAACCTTTTTCACTTAGATTTGGTTCCAGGTAAGGAGGTGGACCGTCACCCGTCTCTTATATACTGTATCTACGTTC 
EVKKESVPLHETtS i GKSESKPRSI PPPGSGQR I YD I DPS 

CTTGGCAGGTTTCCGTC AGC ATC TTGACT ACCGATATTCACAGrACAAAAGGCTGCGTGAGGAAATTGACAAGTATGAAGGTGGTTTGGATGCATTCTCTCGTGGATTTGAAAAGTTTGG 
GAACCGTCCAAAGGCAGTCGTAGAACTGATGGCTATAAGTGTCATGTTTTCCGACGCACTCCTTTAAC TGTTCATACTTCCACCAAACCTACGTAAGAGAGCACCTAAACTTTTCAAACC 

LAGFRQHLDYRYSQYKRLREE I DKYEGGLOAFSRGFEKFG 

TTTCTTACGCAGTGAAACAGGAATAACTTATAGGGAATGGGCACCTGGAGCTACGTGGGCTGCACTTATTGGAGATTTCAACAATTCGAATCCT AATGCAGATCTCArGACTCGGAATGA 
AAAGAATGCGTCACTTTGTCCTTATTGAATATCCCTTACCCGTGGACCTCGATGCACCCGACGTGAATAACCTCTAAAGTTGTTAACCTTAGGATTACGTCTACAGTACTGAGCCTTACT 

FLRSETG I TYREWAPGATWAALf GDFNNWNPNAD VMTRNE 
GTTTCGTCTCTGGGAGATT TTTTTGCC AAATAACGCAGATGGTTCACCACCAATT^ 

CAAACCACAGACCCTCTAAAAAAACGGTTTATTGCGTCTACCAAGTGGTGGTTAAGGAGTACCAAGAGCTCATrTCTATCCCTACCTATGAGGTAGACCGTAGTTTCTAAGTTAAGCACG 

FGVWE IFLPNNADGSPP IPHGSRVK I R M D T P 5 G tKDS I P A 

TTGGATCA AGTTCTCACTTCAGGCACCTGGTGAAATCCCATACAATGCCATATACTATGATCCACCAAAGGAGGAGAAGTATGTGTTCAAACATCCTCAGCCAAAGAGACCAAAATCACT 
AACCTAGTTCAAGAGTCAAGTCCGTGGACCACTTTAGGGTATGTTACGGTATATGATACTAGGTGGTTTCCTCCTCTTCATACACAAGTTTGTAGGAGTCGGTTTCTCTCGTTTTAGTGA 

VtKFSVQAPGEIPYNA ! YYQPPKEEXYVFKHPGPKRPKSL 

Nde I Hind III 

TACGATTTATGAATCTCATGTTG GGATGACTAGTATGCAGCCAATAATTAACAC ATA7GCCAACTTTAGACATGATATGCTTCCTCGCATCAAAAAGCTTGGCTACAATGCTGTTCAGAT 
ATCCTAAATACTTAGAGTACAACCCTACTCATCATACCTCGGTTATTAATTGTGTATACGGTTGAAATCTCTACTATACGAAGGAGCGTAGTTTTTCGAACCGATGTTACGACAAGTCTA 

RI YESHVGMSSMEP I I NTYANFRDDHLPR [KKLGYNAVQ I 

Kpn I 

CATGGCTATTCAAGAGCATTCCTATTATGCTAGTTTTGGGTACCATGTCACAAAC TTTTTTGCACCTAGCAGCCGATTTGGAACTCCTGATGATTTGA AGTCTTTAATAGATAAAGCTCA 
GTACCGATAAGTTCTCGTAAGGATAATACGATCAAAACCCATGGTACAGTGTTTGAAAAAACGTGGATCGTCGGCTAAACCTTGAGGACTACTAAACTTCAGAAATIATCTATTTCGAGT 

H A I QEHSYYASFGYHVTNFFAPSSRFGTPODLKSLfOXAH 

TGAGTTAGCGCTGCTTGTTCTCATGGATATrGTTCArAQCCATGCG TCAAATAAT ACGrTGGATGGGCTGAACATGrTTGATGGTACGGATAGT CACTACTTCCACTCCGGATCACGGCG 
ACTCAATCCCGACGAACAAGAGTACCTATAACAAGTATCGGTACGCAGTTTATTATGCAACCTACCCGACTTGTACAAACTACCATGCCTATCAGTGATGAAGGTGAGGCCrAGTGCCCC 

ELGLLVLflO t VHSHASNMTLDGLNMFDGTOSHYFKSGSRG 
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TCATCATTGGTTGTGCGACTCTCCCCTTTTCAACrATGGAAGCTGGGAGCTGCTAAGATTTCTTCTTTCAAATGCAAGATGGTGCTTGGAACACTACAGCTTrCATGSTTTTAGATTTQA 
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ACTAGTAACCAACACCCTGAGAGCGGAAAAGTTGATACCTTCGACCCTCCACGATTCTAAAGAAGAAAGTTTACGTTCTACCACCAACCTTCTCATGTCCAAACTACCAAAATCTAAACT 

HHtfLWDSRLFNYGSWEVLRFLLSNARVWLEEYRFDGFRFO 

Nco i Sea t 

TGGGGTGACTTCCATGATGTACACTCCCCATGGGTTGCAGGTAGCTTTTACTGGCAACTACAATGAGTACTTTGGATATGCAACTGATGTAGATGCTGTCATTTATTTGATCCTTGTGAA 
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ACCCCACTGAAGGTACTACATGTGAGGGGTACCCAACGTCCATCGAAAATGACCGTTGATGTTACTCATGAAACCTATACGTTGACTACATCTACGACACTAAATAAACTACGAACACTT 

G V T S M M Y TPHGLQ VAFTGNYNEYFCYATDVDAV I YLMLVN 

TGATATGATTCACGGTCTTTTCCCTGACGCTGTTACCATTGGTGAAGATGTTAGCCGAAAGCCAACATTTTGCATTCCAGTGGAAGATGGTGGTGTTGGATTTGATTACC6TCTCCACAT 

» 1 1800 

ACTATACTAAGTGCCAGAAAAGGGACTCCGACAATGGTAACCACTTCTACAATCGCCTTTCGGTTGTAAAACGTAAGGTCACCTTCTACCACCACAACCTAAACTAATGGCAGAGGTGTA 

DM I HGLFPEA V T I GEOVSGKPTFC E PVEDGGVCFDYRLHM 

GGCCATTGCCGATAAATGGATTGAGATTCTTAAGAAGAGAGATGAGGACTGGAAAATGGGTCACATTGTGCATACACTCACCAACAGAAGGTGGTTGGAAAAATGTCTTGCTTATGCTGA 
CCGGTAACGGCTATTTACCTAACTCTAAGAATTCTTCTCTCTACTCCTGACCTTTTACCCACTGTAACACGTATGTGAGTGGTTGTCTTCCACCAACCTTTTTACACAACCAATACGACT 

AI ADKWI EI LKKRDEQWKMGDIVHTLrNRRVLEKCVAYAE 

AAGTCATGACCAAGCTCTTGTTGGTGACAAAACTATTGCATTTTGGCTGATGGACAAGGACATGTACGACTTCATGGCTCGTGACAGACCATCTACTCCTCTTATAGATCGTCGAATAGC 
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TTCAGTACTCGTTCGAGAACAACCACTGTTTTGATAACGTAAAACCGACTACCTGTTCCTCTACATGCTGAAGTACCGAGCACTGTCTGGTAGATGAGGAGAATATCTAGCACCTTATCG 

SHDOALVGDKT I AFWLttDKO/IYOFNARDRPSTPLlDRGI A 

Bel I Nco I 

ATTGCACAAAATGATCAGGCTTATTACCATGGGCTTAGGCGGAGAAGGATATTTGAATTTTATGGGAAATGAATTTGGACATCCTGAGT'GGATTGATTTTCCAAGAGGGCATCGACATCT 
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TAACGTGTTT7ACTAGTCCGAATAATGGTACCCGAATCCGCCTCTTCCTATAAACTTAAAATACCCTTTACTTAAACCTGTAGGACTCACCTAACTAAAAGGTTCTCCCCTACCTGTAGA 

LHKM IRL I TMGLGGEGYLNFMGNEFGHPEV I DFPRGORHL 

JBctl 

GCCCAATGGTAAAGTAATTCCAGGGAACAACCACAGTTATGATAAATGCCGTCSTAGATTTG 

CGGGTTACCATTTCATTAAGGTCCCTTGTT6GTGTCAATACTATTTACGGCACCATCTAAACTAGATCCACTACGTCTGATAGATTCTATAGTACCTTACGTTCTCAAACTAGTCCGTTA 

PMGKV IPGNNHSY D K CRRRFOLGOAQYLR YHGMQEFDQAM 

GCAACATCTTGAAGAAGCCTATGGTTTCATGACTTCTGAGCACCAGTATATATCACGGAAGGATGAAGGAGATCGGATCATTGTCTTTGAGAGGGGAAACCTTCTTTTTGTATTCAACTT 
CGTTGTAGAACTTCTTCGGATACCAAAGTACTGAAGACTCGT6GTCATATATAGTGCCTTCCTACTTCCTCTAGCCTAGTAACAGAAACTCTCCCCTTTGGAACAAAAACATAACTTGAA 

QHLEEAYGFMTSEHQY t SRKOEGDft I I VFERGNLVFVFNF 

TCATTGCACTAACAGCTATTCAGATTACCGAGTTGGCTGCTTCAAGTCAGGAAAGTACAAGATTGTTTTGGACTCCGArGATGGCTT GTTTCGAGGCTTCAACAGGCTTAGTCATGATCC 
AGTAACCTGATTGTCGATAAGTCTAATGGCTCAACCGACGAAGTTCAGTCCTTTCATGTTCTAACAAAACCTGAGCCTACTACCGAACAAACCTCCGAAGTTGTCCGAATCAGTACTACG 

HWTNSYSDYR V GCFK5GKYK 1 VLDSDD6LFGGFNRLSH0A 

CGAGCACTTCACCTTTGACGGCTGGTATGATAACCGGCCTCGGTCCTTCATGGTATATGCACCATCTAGGACAGCAGTGGTCTATGCTfTAGTACAAGATCAACAGAA 
GCTCGTGAAGTGGAAACTGCCCACCATACTATTCGCCCGAGCCAGGAAGTACCATATACGTGGTAGATCCTGTCCTCACCAGATACGAAATCATCTTCTACTTCTCTTACTTCGrCTCTT 

EHFTFDGWYONRP RSFHVYAPSRTAV VYALVEDEENEAEN 

.BftrtiH I Hinc II 

TGAAGTAGAAAGTGAAGTGAAACCAGCCTCCGGCTGAGATAGATATTTAGTAAGAGGATCCCCTAAAGCAGGAATGGTTAACCTGTGCATCTGCATTGAACGACGTATATTGAGACTGGA 
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ACTTCATCTTTCACTTCACTTTGGTCGGAGGCCGACTCTATCTATAAATCATTCTCCTAGGGGATTTCGTCCTTACCAATTGGACACGTAGACGTAACTTGCTGCATATAACTCTGACCT 

EVESEVKPASG 

Sail 

Nefel Xbal j Mmc tl Pstl 

AATCCATATGACTAGTAGATCCTCTAGAGTCGACCTGCAGGCATG 

p. 2805 

TTAGGTATACTGATCATCTAGGAGATCTCAGCTGGACGTCCGTAC 
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